Portfolio Assignment 1, Methods 3, 2021, autumn semester

Assignment 1: Using mixed effects modelling to model hierarchical data

In this assignment we will be investigating the politeness dataset of Winter and Grawunder (2012) and apply basic methods of multilevel modelling.

#this is the dataset we will be exploring in this exercise
politeness <- read.csv('politeness.csv') ## read in data

Exercise 1 - describing the dataset and making some initial plots

  1. Describe the dataset, such that someone who happened upon this dataset could understand the variables and what they contain

Exercise 1, part 1– (EH)

The politeness dataset contains the data obtained from the study of Korean formal and informal speech which investigated the fundamental frequency of male and female participants’ speech in a variety of formal and informal scenarios.

The following table describes the variables in the dataset:

Variable Description
subject participant ID
gender participant’s gender
scenario the experimental scenario from 1 to 7 such as “asking a favour”
attitude either ‘inf’ for informal stimuli or ‘pol’ for formal stimuli
total_duration duration of participant’s response in seconds
f0mn mean fundamental frequency (f0) of the participant’s speech
hiss_count number of times the participants made a noisy breath intake

Remark: The gender, scenario and attitude variables should be encoded as factors as they show a categorical function withing this dataset . In addition, these variables have non-unique values across participants, and are not ordered.

#Encoding some of the variables as factors (gender, attitude, and scenario)

politeness$attitude <- as.factor(politeness$attitude)
politeness$gender <- as.factor(politeness$gender)
politeness$scenario <- as.factor(politeness$scenario)

Exercise 1, part 2–(EH)

  1. Create a new data frame that just contains the subject F1 and run two linear models; one that expresses f0mn as dependent on scenario as an integer; and one that expresses f0mn as dependent on scenario encoded as a factor
    1. Include the model matrices, X from the General Linear Model, for these two models in your report and describe the different interpretations of scenario that these entail
    2. Which coding of scenario, as a factor or not, is more fitting?
# Create a subset dataframe for subject F1 only
pf1 <- politeness[politeness$subject == "F1", ]
pf1
##    subject gender scenario attitude total_duration  f0mn hiss_count
## 1       F1      F        1      pol         18.392 214.6          2
## 2       F1      F        1      inf         13.551 210.9          0
## 3       F1      F        2      pol          5.217 284.7          0
## 4       F1      F        2      inf          4.247 265.6          0
## 5       F1      F        3      pol          6.791 210.6          0
## 6       F1      F        3      inf          4.126 285.6          0
## 7       F1      F        4      pol          6.244 251.5          1
## 8       F1      F        4      inf          3.245 281.5          0
## 9       F1      F        5      pol          5.625 229.6          1
## 10      F1      F        5      inf          3.950 250.5          0
## 11      F1      F        6      pol         28.508 181.1          1
## 12      F1      F        6      inf         55.159 229.3          0
## 13      F1      F        7      inf         60.309 219.8          2
## 14      F1      F        7      pol         40.825 175.8          0
# make model predicting f0mn by scenario (integer)
m1<- lm(f0mn ~ as.integer(scenario), data = pf1)

# get model matrix
mm1 <- model.matrix(m1)

# make model predicting f0mn by scenario (factor)
m2 <- lm(f0mn ~ as.factor(scenario), data = pf1)

# get model matrix
mm2 <- model.matrix(m2)
Here is the model using "scenario" encoded as an integer
summary(m1)
## 
## Call:
## lm(formula = f0mn ~ as.integer(scenario), data = pf1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.836 -36.807   6.686  20.918  46.421 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           262.621     20.616  12.738 2.48e-08 ***
## as.integer(scenario)   -6.886      4.610  -1.494    0.161    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.5 on 12 degrees of freedom
## Multiple R-squared:  0.1568, Adjusted R-squared:  0.0865 
## F-statistic: 2.231 on 1 and 12 DF,  p-value: 0.1611
mm1
##    (Intercept) as.integer(scenario)
## 1            1                    1
## 2            1                    1
## 3            1                    2
## 4            1                    2
## 5            1                    3
## 6            1                    3
## 7            1                    4
## 8            1                    4
## 9            1                    5
## 10           1                    5
## 11           1                    6
## 12           1                    6
## 13           1                    7
## 14           1                    7
## attr(,"assign")
## [1] 0 1

And here is the model using “scenario” encoded as a factor

summary(m2)
## 
## Call:
## lm(formula = f0mn ~ as.factor(scenario), data = pf1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -37.50 -13.86   0.00  13.86  37.50 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            212.75      20.35  10.453  1.6e-05 ***
## as.factor(scenario)2    62.40      28.78   2.168   0.0668 .  
## as.factor(scenario)3    35.35      28.78   1.228   0.2591    
## as.factor(scenario)4    53.75      28.78   1.867   0.1041    
## as.factor(scenario)5    27.30      28.78   0.948   0.3745    
## as.factor(scenario)6    -7.55      28.78  -0.262   0.8006    
## as.factor(scenario)7   -14.95      28.78  -0.519   0.6195    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28.78 on 7 degrees of freedom
## Multiple R-squared:  0.6576, Adjusted R-squared:  0.364 
## F-statistic:  2.24 on 6 and 7 DF,  p-value: 0.1576
mm2
##    (Intercept) as.factor(scenario)2 as.factor(scenario)3 as.factor(scenario)4
## 1            1                    0                    0                    0
## 2            1                    0                    0                    0
## 3            1                    1                    0                    0
## 4            1                    1                    0                    0
## 5            1                    0                    1                    0
## 6            1                    0                    1                    0
## 7            1                    0                    0                    1
## 8            1                    0                    0                    1
## 9            1                    0                    0                    0
## 10           1                    0                    0                    0
## 11           1                    0                    0                    0
## 12           1                    0                    0                    0
## 13           1                    0                    0                    0
## 14           1                    0                    0                    0
##    as.factor(scenario)5 as.factor(scenario)6 as.factor(scenario)7
## 1                     0                    0                    0
## 2                     0                    0                    0
## 3                     0                    0                    0
## 4                     0                    0                    0
## 5                     0                    0                    0
## 6                     0                    0                    0
## 7                     0                    0                    0
## 8                     0                    0                    0
## 9                     1                    0                    0
## 10                    1                    0                    0
## 11                    0                    1                    0
## 12                    0                    1                    0
## 13                    0                    0                    1
## 14                    0                    0                    1
## attr(,"assign")
## [1] 0 1 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$`as.factor(scenario)`
## [1] "contr.treatment"

Conclusion: The above output shows the difference in model matrices between scenario encoded as an integer and factor. The integer version treats scenario as a continuous variable, whereas the factorized version creates a regression line per scenario.

For this dataset, scenario should be a factor, since the scenarios are not a continuous variable and depending on the prescribed scenario, the participants may have a different f0 (mean fundamental frequency of speech), and we are interested in following the trajectory of f0 across scenarios not as a variable that consistently decreases or increases, but a separate regression line showing the changes in f0 between the 7 different scenarios. And in order, to be able to see that crucial difference we need to consider the ‘scenario’ variable as a factor when we run a model predicting f0 across scenarios.

Exercise 1, part 3 – (EH)

  1. Make a plot that includes a subplot for each subject that has scenario on the x-axis and f0mn on the y-axis and where points are colour coded according to attitude
    1. Describe the differences between subjects
politeness %>% ggplot(aes(scenario, f0mn, color = attitude)) +
    geom_point() +
    facet_wrap(vars(subject))
## Warning: Removed 12 rows containing missing values (geom_point).

We can visually observe that there are baseline differences between the male and female subjects’ mean fundamental frequency of speech, where the males’ f0 is consistently lower, across scenario and attitude. Between the different scenarios there is variability in the f0 values for both male and female subjects depending on both the scenario type and the attitude (informal or formal). There is a consistent tendency across scenario type and gender for the mean fundamental frequency of speech to be slightly higher when the attitude is informal as opposed to formal. This visual information inerpreted from this plot is consistent with the results of Winter and Grawunder (2012)

##Exercise 2 - comparison of models

mixed.model <- lmer(formula=..., data=...)
example.formula <- formula(dep.variable ~ first.level.variable + (1 | second.level.variable))

###Part 1 – (VK) 1) Build four different models and do some comparisons

# the single level model
m3 <- lm(formula = f0mn ~ gender, data = politeness)

# a two-level model where each scenario has a unique intercept
m4 <- lmer(formula = f0mn ~ gender + (1 | scenario), data = politeness)
# a two-level model that has models subject as intercept
m5 <- lmer(formula = f0mn ~ gender + (1 | subject), data = politeness)

# a two-level model that incorporate intercepts for both subject and scenario 
m6 <- lmer(formula = f0mn ~ gender +
            (1 | subject) + (1 | scenario), data = politeness)

#comparing AIC and Deviance values for all the models
AIC(m3)
## [1] 2163.971
AIC(m4)
## [1] 2152.314
AIC(m5)
## [1] 2099.626
AIC(m6)
## [1] 2092.482
deviance(m3)
## [1] 327033.6
deviance(m4)
## Warning in deviance.merMod(m4): deviance() is deprecated for REML fits;
## use REMLcrit for the REML criterion or deviance(.,REML=FALSE) for deviance
## calculated at the REML fit
## [1] 2144.314
deviance(m5)
## Warning in deviance.merMod(m5): deviance() is deprecated for REML fits;
## use REMLcrit for the REML criterion or deviance(.,REML=FALSE) for deviance
## calculated at the REML fit
## [1] 2091.626
deviance(m6)
## Warning in deviance.merMod(m6): deviance() is deprecated for REML fits;
## use REMLcrit for the REML criterion or deviance(.,REML=FALSE) for deviance
## calculated at the REML fit
## [1] 2082.482
anova(m4, m5, m6)
## refitting model(s) with ML (instead of REML)
## Data: politeness
## Models:
## m4: f0mn ~ gender + (1 | scenario)
## m5: f0mn ~ gender + (1 | subject)
## m6: f0mn ~ gender + (1 | subject) + (1 | scenario)
##    npar    AIC    BIC  logLik deviance   Chisq Df Pr(>Chisq)   
## m4    4 2162.3 2175.7 -1077.1   2154.3                         
## m5    4 2112.1 2125.5 -1052.0   2104.1 50.2095  0              
## m6    5 2105.2 2122.0 -1047.6   2095.2  8.8725  1   0.002895 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
piecewiseSEM::rsquared(c(m4, m5, m6))
##   Response   family     link method  Marginal Conditional
## 1     f0mn gaussian identity   none 0.6779555   0.6967788
## 2     f0mn gaussian identity   none 0.6681651   0.7899229
## 3     f0mn gaussian identity   none 0.6677206   0.8077964

The single level model performs the worst and this makes sense as we do not expect all participants to have the same f0 as their voices have naturally occurring differences (not just ones predicted by gender). There are differences that might be explained by the scenario or individual subject (as we observed in the plot above), however, neither of those are taken into account when using a single level model.

So then using two-level models that explain variance by either taking scenario or subject as random intercepts, is definitely an improvement to help explain more of the scenario/attitude based differences, not just the gender differences.

Consequently, Of the three multi-level models,it is model m6, which includes random intercepts for both subject and scenario,that has the most explained variance with for the entire model R20.81 or 81%.

Additionally, we see that model m6 has the lowest AIC and deviance.

Exercise 2, part 2 and 3 – (LR)

  1. Why is our single-level model bad?
    1. create a new data frame that has three variables, subject, gender and f0mn, where f0mn is the average of all responses of each subject, i.e. averaging across attitude and_scenario_
    2. build a single-level model that models f0mn as dependent on gender using this new dataset
    3. make Quantile-Quantile plots, comparing theoretical quantiles to the sample quantiles) using qqnorm and qqline for the new single-level model and compare it to the old single-level model (from 1).i). Which model’s residuals (ϵ) fulfil the assumptions of the General Linear Model better?)
    4. Also make a quantile-quantile plot for the residuals of the multilevel model with two intercepts. Does it look alright?
  2. Plotting the two-intercepts model
    1. Create a plot for each subject, (similar to part 3 in Exercise 1), this time also indicating the fitted value for each of the subjects for each for the scenarios (hint use fixef to get the “grand effects” for each gender and ranef to get the subject- and scenario-specific effects)
# scenario x f0mn y, attitude = color
ff <- fixef(m6)
ff
## (Intercept)     genderM 
##    246.7650   -115.1746
rf <- ranef(m6)
rf <- as.data.frame(rf)
rf
##      grpvar        term grp    condval   condsd
## 1   subject (Intercept)  F1 -10.490356 8.280794
## 2   subject (Intercept)  F2  10.251809 8.280794
## 3   subject (Intercept)  F3   3.795129 8.280794
## 4   subject (Intercept)  F4 -29.495270 9.095403
## 5   subject (Intercept)  F5  47.093999 8.280794
## 6   subject (Intercept)  F6 -18.396273 8.794359
## 7   subject (Intercept)  F7  -6.976691 8.280794
## 8   subject (Intercept)  F8  -8.521934 8.280794
## 9   subject (Intercept)  F9  12.739587 8.280794
## 10  subject (Intercept)  M1  21.052117 8.280794
## 11  subject (Intercept)  M2  -5.462358 9.453009
## 12  subject (Intercept)  M3  33.561535 8.280794
## 13  subject (Intercept)  M4  16.093337 8.524543
## 14  subject (Intercept)  M5 -28.267430 8.280794
## 15  subject (Intercept)  M6 -12.640202 8.794541
## 16  subject (Intercept)  M7 -24.336998 8.280794
## 17 scenario (Intercept)   1 -11.595496 5.488728
## 18 scenario (Intercept)   2   5.321218 5.532205
## 19 scenario (Intercept)   3   6.795658 5.586194
## 20 scenario (Intercept)   4  11.348815 5.578013
## 21 scenario (Intercept)   5   1.411037 5.488705
## 22 scenario (Intercept)   6  -8.622136 5.489258
## 23 scenario (Intercept)   7  -4.659096 5.488058
politeness$effect_gender <- 0.0
politeness[politeness$gender == "F", ]$effect_gender <- ff[1]
politeness[politeness$gender == "M", ]$effect_gender <- ff[1] + ff[2]
politeness$intercept_subject <- left_join(politeness, rf, by = c("subject" = "grp"), copy = TRUE, keep = FALSE)$condval
politeness$intercept_scenario <- left_join(politeness, rf, by = c("scenario" = "grp"), copy = TRUE, keep = FALSE)$condval
politeness$predicted <- politeness$effect_gender + politeness$intercept_subject + politeness$intercept_scenario
politeness %>% ggplot(aes(scenario, f0mn, color = attitude)) +
    geom_point() +
    geom_point(aes(y = predicted, shape = "fitted values"), color = "black", size = 2) +
    scale_shape_manual(name = "model", values = c(18)) +
    facet_wrap(vars(subject))
## Warning: Removed 12 rows containing missing values (geom_point).

deviance(m3)
## [1] 327033.6
deviance(m4, REML = FALSE)
## [1] 2154.33
deviance(m5, REML = FALSE)
## [1] 2104.175
deviance(m6, REML = FALSE)
## [1] 2095.279
politeness_aggregated <- politeness[!is.na(politeness$f0mn), ] %>% group_by(subject) %>% summarize(subject = subject[1], gender = gender[1], f0mn = mean(f0mn))
politeness_aggregated
## # A tibble: 16 x 3
##    subject gender  f0mn
##    <chr>   <fct>  <dbl>
##  1 F1      F       235.
##  2 F2      F       258.
##  3 F3      F       251.
##  4 F4      F       212.
##  5 F5      F       299.
##  6 F6      F       225.
##  7 F7      F       239.
##  8 F8      F       237.
##  9 F9      F       261.
## 10 M1      M       155.
## 11 M2      M       122.
## 12 M3      M       169.
## 13 M4      M       150.
## 14 M5      M       100.
## 15 M6      M       118.
## 16 M7      M       104.
m7 <- lm(f0mn ~ gender, data = politeness_aggregated)
par(mfrow=c(1,2))
qqnorm(fitted.values(m7))
qqline(fitted.values(m7))
qqnorm(fitted.values(m2))
qqline(fitted.values(m2))

par(mfrow=c(1,1))
qqnorm(fitted.values(m6))
qqline(fitted.values(m6), col = "red")

Assessing the QQ-plots of the single-level models it seems that the aggregated model m7’s residuals are worse off than those of model m2. The residuals of model m2 are better dispersed along the line - however it still doesn’t look fantastic. The QQ-plot of the multilevel model m6 looks better than any of the single level ones, with data points closer to the line and more evenly dispersed on both sides of the line.

Looking at the plot for the observed and the fitted values it looks as if the model m6 performs reasonably as well.

Exercise 3 - now with attitude

Exercise 3, part 1–(VK) 1) Carry on with the model with the two unique intercepts fitted (scenario and subject) but now build a new model that has attitude as a main effect besides gender. After create a separate model that besides the main effects of attitude and gender also include their interaction

#making a model with two unicue intercepts and having attitude and gender as main effects
m8 <- lmer(formula = f0mn ~ gender + attitude + (1 | subject) + (1 | scenario), data = politeness)
summary(m8)
## Linear mixed model fit by REML ['lmerMod']
## Formula: f0mn ~ gender + attitude + (1 | subject) + (1 | scenario)
##    Data: politeness
## 
## REML criterion at convergence: 2065.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.8511 -0.6081 -0.0602  0.4329  3.8745 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 585.6    24.20   
##  scenario (Intercept) 106.7    10.33   
##  Residual             882.7    29.71   
## Number of obs: 212, groups:  subject, 16; scenario, 7
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  254.398      9.597  26.507
## genderM     -115.437     12.881  -8.962
## attitudepol  -14.819      4.096  -3.618
## 
## Correlation of Fixed Effects:
##             (Intr) gendrM
## genderM     -0.587       
## attitudepol -0.220  0.006
fitted.values(m8)
##         1         2         3         4         5         6         7         8 
## 216.47929 231.29853 234.27048 249.08973 236.09087 250.91011 241.21606 256.03530 
##         9        10        11        12        13        14        15        16 
## 230.41934 245.23858 219.58594 234.40518 238.54395 223.72471 237.34031 252.15956 
##        17        18        19        20        21        22        23        24 
## 255.13151 269.95075 256.95189 271.77114 262.07708 276.89632 251.28036 266.09960 
##        25        26        27        28        29        30        31        32 
## 240.44696 255.26621 259.40498 244.58573 230.84663 245.66588 248.63783 263.45707 
##        33        34        35        36        37        38        39        40 
## 250.45822 265.27746 255.58340 270.40264 244.78668 259.60593 233.95328 248.77253 
##        41        42        43        44        45        46        47        49 
## 252.91130 238.09206 198.00782 212.82706 215.79901 230.61826 217.61940 222.74458 
##        51        52        53        54        55        57        58        59 
## 211.94787 226.76711 201.11447 215.93371 220.07248 274.39362 289.21286 292.18481 
##        60        61        62        63        64        65        66        67 
## 307.00406 294.00520 308.82444 299.13038 313.94963 288.33367 303.15291 277.50027 
##        68        69        70        71        72        73        74        75 
## 292.31951 296.45828 281.63904 209.70329 224.52253 227.49448 242.31372 229.31487 
##        76        77        79        81        82        83        84        85 
## 244.13411 234.44005 223.64333 212.80994 227.62918 231.76795 216.94871 220.01309 
##        86        87        88        89        90        91        92        93 
## 234.83233 237.80428 252.62353 239.62467 254.44391 244.74985 259.56910 233.95314 
##        94        95        96        97        98        99       100       101 
## 248.77238 223.11974 237.93898 242.07775 227.25851 218.45899 233.27823 236.25019 
##       102       103       104       105       106       107       108       109 
## 251.06943 238.07057 252.88982 243.19576 258.01500 232.39904 247.21828 221.56564 
##       110       111       112       113       114       115       116       117 
## 236.38488 240.52366 225.70441 239.84235 254.66159 257.63354 272.45278 259.45393 
##       118       119       120       121       122       123       124       125 
## 274.27317 264.57911 279.39835 253.78240 268.60164 242.94900 257.76824 261.90701 
##       126       127       128       129       130       131       132       133 
## 247.08777 133.00242 147.82166 150.79361 165.61285 152.61400 167.43324 157.73918 
##       134       135       136       137       138       139       140       141 
## 172.55843 146.94247 161.76171 136.10907 150.92831 155.06708 140.24784 107.79414 
##       142       143       147       149       150       151       152       153 
## 122.61338 125.58533 132.53090 121.73419 136.55343 110.90079 125.72003 129.85880 
##       154       155       156       157       158       159       160       161 
## 115.03956 145.58352 160.40276 163.37471 178.19395 165.19510 180.01434 170.32028 
##       162       163       164       165       166       167       168       169 
## 185.13952 159.52357 174.34281 148.69017 163.50941 167.64818 152.82894 127.46793 
##       170       171       172       173       174       175       176       177 
## 142.28717 145.25913 160.07837 147.07951 161.89876 152.20470 167.02394 141.40798 
##       178       180       181       182       183       184       185       186 
## 156.22722 145.39382 149.53260 134.71335  83.40025  98.21950 101.19145 116.01069 
##       187       188       189       190       191       192       193       194 
## 103.01184 117.83108 108.13702 122.95626  97.34030 112.15955  86.50690 101.32615 
##       195       196       198       200       201       202       203       204 
## 105.46492  90.64568 112.78128 130.57248 117.57362 132.39287 122.69881 137.51805 
##       205       206       207       208       209       210       211       212 
## 111.90209 126.72133 101.06869 115.88793 120.02671 105.20746  87.35321 102.17245 
##       213       214       215       216       217       218       219       220 
## 105.14440 119.96365 106.96479 121.78403 112.08998 126.90922 101.29326 116.11250 
##       221       222       223       224 
##  90.45986 105.27910 109.41787  94.59863
par(mfrow=c(1,2))
qqnorm(fitted.values(m8))
qqline(fitted.values(m8), col = "red")
qqnorm(politeness$f0mn)
qqline(politeness$f0mn, col = "red")

plot(m8)

#a model that additionally includes the interaction between the main effects (attitude and gender)
m9 <- lmer(formula = f0mn ~ gender + attitude + gender:attitude + (1 | subject) + (1 | scenario), data = politeness)
summary(m9)
## Linear mixed model fit by REML ['lmerMod']
## Formula: f0mn ~ gender + attitude + gender:attitude + (1 | subject) +  
##     (1 | scenario)
##    Data: politeness
## 
## REML criterion at convergence: 2058.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.8120 -0.5884 -0.0645  0.4014  3.9100 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 584.4    24.17   
##  scenario (Intercept) 106.4    10.32   
##  Residual             885.5    29.76   
## Number of obs: 212, groups:  subject, 16; scenario, 7
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          255.618      9.761  26.186
## genderM             -118.232     13.531  -8.738
## attitudepol          -17.192      5.423  -3.170
## genderM:attitudepol    5.544      8.284   0.669
## 
## Correlation of Fixed Effects:
##             (Intr) gendrM atttdp
## genderM     -0.606              
## attitudepol -0.286  0.206       
## gndrM:tttdp  0.187 -0.309 -0.654
The model m9 can be read as following: The intercept for women/inf are $ \approx 256 Hz$, when we look at men with the same attitude their pitch drops by $ \approx 118 Hz $. Overall a polite attitude will result in a drop in pitch by $ \approx 17 Hz$, however for men it will only be $ -17.2+5.5 \approx 11.6 Hz$. Korean women's relative drop in pitch is therefore larger than male's in a polite situation according to this sample. 

Exercise 3, part 2–(KV)

  1. Compare the three models (1. gender as a main effect; 2. gender and attitude as main effects; 3. gender and attitude as main effects and the interaction between them.
#comparing the models
anova(m6, m8, m9)
## refitting model(s) with ML (instead of REML)
## Data: politeness
## Models:
## m6: f0mn ~ gender + (1 | subject) + (1 | scenario)
## m8: f0mn ~ gender + attitude + (1 | subject) + (1 | scenario)
## m9: f0mn ~ gender + attitude + gender:attitude + (1 | subject) + 
## m9:     (1 | scenario)
##    npar    AIC    BIC  logLik deviance   Chisq Df Pr(>Chisq)    
## m6    5 2105.2 2122.0 -1047.6   2095.2                          
## m8    6 2094.5 2114.6 -1041.2   2082.5 12.6868  1  0.0003683 ***
## m9    7 2096.0 2119.5 -1041.0   2082.0  0.4551  1  0.4998998    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
piecewiseSEM::rsquared(c(m6, m8, m9))
##   Response   family     link method  Marginal Conditional
## 1     f0mn gaussian identity   none 0.6677206   0.8077964
## 2     f0mn gaussian identity   none 0.6782542   0.8196777
## 3     f0mn gaussian identity   none 0.6782490   0.8192531

Exercise 3, part 3–(KV) 3) Choose the model that you think describe the data the best - and write a short report on the main findings based on this model.

summary(m8)
## Linear mixed model fit by REML ['lmerMod']
## Formula: f0mn ~ gender + attitude + (1 | subject) + (1 | scenario)
##    Data: politeness
## 
## REML criterion at convergence: 2065.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.8511 -0.6081 -0.0602  0.4329  3.8745 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 585.6    24.20   
##  scenario (Intercept) 106.7    10.33   
##  Residual             882.7    29.71   
## Number of obs: 212, groups:  subject, 16; scenario, 7
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  254.398      9.597  26.507
## genderM     -115.437     12.881  -8.962
## attitudepol  -14.819      4.096  -3.618
## 
## Correlation of Fixed Effects:
##             (Intr) gendrM
## genderM     -0.587       
## attitudepol -0.220  0.006

This dataset consists of the basic demographic information of 16 Korean participants, and their observed pitch in different situations that require either an informal or polite(formal) attitude.

The model we chose as the best is m8. Consistent with the conclusions of the original oauthor’s study, this model showed that women on average have a higher pitch than men BUT it also suggested a negative relationship between attitude and pitch with p-values<0.001. It would seem that both Korean men’s and women’s frequency of voice drops when having a polite attitude.

Subjects and scenarios should have different intercepts because it would be assumed they would all have different baselines, (and therefore need different intercepts to account for this). Different subjects will naturally already speak at a different pitch level, so separate intercepts allows us to account for these differences. The different scenarios may also need separate intercepts as certain scenarios may result in participants lowering or raising their pitch to meet the appropriate ambiance of the scenario. Once again, by including separate intercepts for scenarios then we should be accounting for these differences in our models.

Furthermore the output of the summary function shows that more variance is explained by the random effect of the subject than that of the scenario, further strengthening the choice of multilevel modelling.

And here’s a QQ-plot of our chosen model

qqnorm(fitted.values(m8))
qqline(fitted.values(m8), col = "red")

Portfolio Assignment 2 part 1, Methods 3, 2021, autumn semester

Portfolio Assignment 2 part 1, Methods 3, 2021, autumn semester

Exercises and objectives

The objectives of the exercises of this assignment are:
1) Download and organise the data and model and plot staircase responses based on fits of logistic functions
2) Fit multilevel models for response times
3) Fit multilevel models for count data

REMEMBER: In your report, make sure to include code that can reproduce the answers requested in the exercises below (MAKE A KNITTED VERSION)
REMEMBER: This assignment will be part of your final portfolio

Exercise 1

Go to https://osf.io/ecxsj/files/ and download the files associated with Experiment 2 (there should be 29).
The data is associated with Experiment 2 of the article at the following DOI https://doi.org/10.1016/j.concog.2019.03.007

  1. Put the data from all subjects into a single data frame
  2. Describe the data and construct extra variables from the existing variables
    1. add a variable to the data frame and call it correct (have it be a logical variable). Assign a 1 to each row where the subject indicated the correct answer and a 0 to each row where the subject indicated the incorrect answer (Hint: the variable obj.resp indicates whether the subject answered “even”, e or “odd”, o, and the variable target_type indicates what was actually presented.
    2. describe what the following variables in the data frame contain, trial.type, pas, trial, target.contrast, cue, task, target_type, rt.subj, rt.obj, obj.resp, subject and correct. (That means you can ignore the rest of the variables in your description). For each of them, indicate and argue for what class they should be classified into, e.g. factor, numeric etc.
    3. for the staircasing part only, create a plot for each subject where you plot the estimated function (on the target.contrast range from 0-1) based on the fitted values of a model (use glm) that models correct as dependent on target.contrast. These plots will be our no-pooling model. Comment on the fits - do we have enough data to plot the logistic functions?
    4. on top of those plots, add the estimated functions (on the target.contrast range from 0-1) for each subject based on partial pooling model (use glmer from the package lme4) where unique intercepts and slopes for target.contrast are modelled for each subject
    5. in your own words, describe how the partial pooling model allows for a better fit for each subject

Answers

Exercise 1, part 1 - EH

# get list of all CSV files
temp <- list.files(path = "./experiment_2/",
    pattern = "*.csv", full.names = TRUE)
# load into single data frame
samples <- map_df(temp, read_csv, trim_ws = TRUE, na = c("", "NA"),
    col_types = cols(
        trial.type = col_factor(),
        pas = col_factor(),
        trial = col_factor(),
        jitter.x = col_double(),
        jitter.y = col_double(),
        odd.digit = col_integer(),
        target.contrast = col_double(),
        target.frames = col_double(),
        cue = col_factor(),
        task = col_factor(),
        target.type = col_factor(),
        rt.subj = col_double(),
        rt.obj = col_double(),
        even.digit = col_integer(),
        seed = col_double(),
        obj.resp = col_factor(),
        subject = col_factor()
    ))
rm(temp)
# peek data
head(samples)
## # A tibble: 6 x 17
##   trial.type pas   trial jitter.x jitter.y odd.digit target.contrast
##   <fct>      <fct> <fct>    <dbl>    <dbl>     <int>           <dbl>
## 1 staircase  4     0      -0.343    0.449          9             1  
## 2 staircase  4     1       0.0623   0.0291         9             1  
## 3 staircase  4     2      -0.406    0.500          7             0.9
## 4 staircase  4     3      -0.362   -0.222          7             0.9
## 5 staircase  4     4       0.289    0.413          7             0.8
## 6 staircase  4     5       0.0824  -0.0934         7             0.8
## # ... with 10 more variables: target.frames <dbl>, cue <fct>, task <fct>,
## #   target.type <fct>, rt.subj <dbl>, rt.obj <dbl>, even.digit <int>,
## #   seed <dbl>, obj.resp <fct>, subject <fct>
# total number of samples
nrow(samples)
## [1] 18131

Exercise 1, part 2 - KV

# Add column to indicate if participant response was correct
samples <- mutate(samples,
    correct = as.logical(
        ifelse(substr(target.type, 1, 1) == obj.resp, 1, 0)
    )
)

The dataset contains the following variables:

Variable Description Class
trial.type either staircase(practice) or experiment factor: categorical, reused
pas any number from 1-4, indicating the reported experience on the Perceptual Awareness Scale factor: categorical, not continuous
trial trial number zero-indexed for the practice and experiment blocks factor: categorical, not continuous
target.contrast the grey-scale proportion of the target digit double: numeric, continous variable from 0-1
cue number code for pre-stimulus cue factor: categorical, not continuous
task how many numbers could be shown; has the levels quadruplet: (all numbers); pairs: (2 even and 2 odd numbers); singles: (1 even and 1 odd number) factor: categorical, no continuous relationship
target_type whether the target shown was the chosen even.digit or the chosen odd.digit factor: categorical, no continuous relationship
rt.subj reaction time (seconds) on the PAS response double: continous variable with decimal places
rt.obj reaction time (seconds) on the target digit double: continous variable with decimal places
obj.resp the key actually pressed e for even and o for odd factor: categorical, no continuous relationship
subject subject number factor: categorical
correct Whether answer was correct(1) or incorret (0) logical, boolean either true or false
# only use staircase trial types.
samples_staircase <- samples %>% filter(trial.type == "staircase")
#making a no-pooling model
m1 <- glm(correct ~ target.contrast + subject,
    data = samples_staircase,
    family = binomial(link = "logit"))
# We should see an intercept per subject
summary(m1)
## 
## Call:
## glm(formula = correct ~ target.contrast + subject, family = binomial(link = "logit"), 
##     data = samples_staircase)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.2225   0.1836   0.7262   0.7907   0.9772  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      1.237950   0.310228   3.990 6.59e-05 ***
## target.contrast  3.446129   0.439005   7.850 4.17e-15 ***
## subject002      -0.200038   0.396544  -0.504   0.6139    
## subject003      -0.126797   0.367239  -0.345   0.7299    
## subject004      -0.447583   0.341680  -1.310   0.1902    
## subject005      -0.729034   0.342889  -2.126   0.0335 *  
## subject006       0.204637   0.412265   0.496   0.6196    
## subject007      -0.257607   0.397907  -0.647   0.5174    
## subject008      -0.728765   0.345702  -2.108   0.0350 *  
## subject009      -0.600476   0.339050  -1.771   0.0766 .  
## subject010      -0.450820   0.353083  -1.277   0.2017    
## subject011      -0.003327   0.353925  -0.009   0.9925    
## subject012      -0.751795   0.339166  -2.217   0.0267 *  
## subject013      -0.251208   0.354408  -0.709   0.4784    
## subject014      -0.279840   0.345979  -0.809   0.4186    
## subject015      -0.330359   0.349908  -0.944   0.3451    
## subject016      -0.781364   0.340545  -2.294   0.0218 *  
## subject017      -0.691345   0.339823  -2.034   0.0419 *  
## subject018      -0.546979   0.340267  -1.607   0.1079    
## subject019       0.502524   0.488717   1.028   0.3038    
## subject020      -0.271168   0.356758  -0.760   0.4472    
## subject021      -0.399562   0.346301  -1.154   0.2486    
## subject022       0.523698   0.488385   1.072   0.2836    
## subject023      -0.065056   0.359321  -0.181   0.8563    
## subject024      -0.531046   0.341990  -1.553   0.1205    
## subject025      -0.160067   0.426526  -0.375   0.7075    
## subject026      -0.222237   0.374445  -0.594   0.5528    
## subject027      -0.487602   0.342189  -1.425   0.1542    
## subject028      -0.547571   0.339972  -1.611   0.1073    
## subject029      -0.335918   0.355318  -0.945   0.3445    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6127.2  on 5602  degrees of freedom
## Residual deviance: 5936.1  on 5573  degrees of freedom
## AIC: 5996.1
## 
## Number of Fisher Scoring iterations: 6
# plot the fitted values per subject
samples_staircase %>%
    ggplot(aes(target.contrast, as.integer(correct), color = correct)) +
    geom_point() +
    geom_line(aes(target.contrast, fitted(m1),
        linetype = "fitted values"), inherit.aes = FALSE) +
    scale_linetype_manual(name = "single-level model", values = c("dashed")) +
    facet_wrap(~subject) +
    labs(title = "Correct answers by target contrast per participant",
        subtitle = "With single-level model fitted values (no-pooling)",
        color = "subject correct") +
    ylab("Correct (probability)") +
    xlab("target contrast")

Ex1.p2.iii Comment on the fits - do we have enough data to plot the logistic functions?

The fits are not great. We can see that the lower contrast seems to result in more incorrect answers but even so, subjects still get correct answer at low contrast. As this is a logical outcome variable, the probabilities for most subjects start around 75%. If the contrast is 0.00, we would expect that the subject would not be able to correctly identify the target stimulus, and should start at 50% (chance level). It does seem as though there is sufficient data, the minimum number of samples for a subject is 70, although there are not the same number of samples for each contrast value.

m2 <- glmer(correct ~ target.contrast + (1 + target.contrast | subject),
    data = samples_staircase,
    family = binomial(link = "logit"))
summary(m2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ target.contrast + (1 + target.contrast | subject)
##    Data: samples_staircase
## 
##      AIC      BIC   logLik deviance df.resid 
##   5988.5   6021.6  -2989.2   5978.5     5598 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.7671  0.0068  0.5532  0.5915  0.9264 
## 
## Random effects:
##  Groups  Name            Variance Std.Dev. Corr 
##  subject (Intercept)      0.2717  0.5213        
##          target.contrast 42.7577  6.5389   -0.84
## Number of obs: 5603, groups:  subject, 29
## 
## Fixed effects:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       0.5619     0.1580   3.557 0.000376 ***
## target.contrast   8.7132     2.3604   3.691 0.000223 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## trgt.cntrst -0.907
samples_staircase %>%
    ggplot(aes(target.contrast, as.integer(correct), color = correct)) +
    geom_point() +
    geom_line(aes(target.contrast, fitted(m2),
        linetype = "fitted values"), inherit.aes = FALSE) +
    scale_linetype_manual(name = "multi-level model", values = c("dashed")) +
    facet_wrap(~subject) +
    labs(title = "Correct answers by target contrast per participant",
        subtitle = "With multi-level model fitted values (partial-pooling)",
        color = "subject correct") +
    ylab("Correct (probability)") +
    xlab("target contrast")

Ex1.p2. v. in your own words, describe how the partial pooling model allows for a better fit for each subject

It allows for per-subject differences in their response to the target contrast. If some subjects have more difficulty seeing differences in low contrast images, we would expect their accuracy to decrease more than other subjects as the contrast values becomes lower. We would conceptually expect no-pooling models to create a better fit for each subject(?), as the subjective models are modelled solely on a given subject. However we would of course always choose the partial pooling models as these would give us great subject-fits but also a model generalizabel for the population.

Exercise 2

Now we only look at the experiment trials (trial.type)

  1. Pick four subjects and plot their Quantile-Quantile (Q-Q) plots for the residuals of their objective response times (rt.obj) based on a model where only intercept is modelled
    1. comment on these
    2. does a log-transformation of the response time data improve the Q-Q-plots?
  2. Now do a partial pooling model modelling objective response times as dependent on task? (set REML=FALSE in your lmer-specification)
    1. which would you include among your random effects and why? (support your choices with relevant measures, taking into account variance explained and number of parameters going into the modelling)

    2. explain in your own words what your chosen models says about response times between the different tasks

  3. Now add pas and its interaction with task to the fixed effects
    1. how many types of group intercepts (random effects) can you add without ending up with convergence issues or singular fits?
    2. create a model by adding random intercepts (without modelling slopes) that results in a singular fit - then use print(VarCorr(<your.model>), comp='Variance') to inspect the variance vector - explain why the fit is singular (Hint: read the first paragraph under details in the help for isSingular)
    3. in your own words - how could you explain why your model would result in a singular fit?

Answers

Exercise 2, part 1 - VK

We will use subjects 002, 009, 014 and 015.

samples_experiment <- samples %>% filter(
    trial.type == "experiment" &
    (
        subject == "002" |
        subject == "009" |
        subject == "014" |
        subject == "015"
    ))
m3 <- lmer(rt.obj ~ (1 | subject),
    data = samples_experiment)
summary(m3)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: rt.obj ~ (1 | subject)
##    Data: samples_experiment
## 
## REML criterion at convergence: 11774.5
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -0.214 -0.098 -0.063 -0.019 39.793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept)  0.1621  0.4026  
##  Residual             53.2091  7.2945  
## Number of obs: 1728, groups:  subject, 4
## 
## Fixed effects:
##             Estimate Std. Error    df t value Pr(>|t|)  
## (Intercept)    1.239      0.267 3.000    4.64   0.0189 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
samples_experiment$m3_resid <- residuals(m3)
# make 2x2 plot grid
par(mfrow = c(2, 2))
# plot qq plots for objective response time of subjects:
# 002
qqnorm(samples_experiment[samples_experiment$subject == "002", ]$m3_resid)
qqline(samples_experiment[samples_experiment$subject == "002", ]$m3_resid,
    col = "red")
# 009
qqnorm(samples_experiment[samples_experiment$subject == "009", ]$m3_resid)
qqline(samples_experiment[samples_experiment$subject == "009", ]$m3_resid,
    col = "red")
# 014
qqnorm(samples_experiment[samples_experiment$subject == "014", ]$m3_resid)
qqline(samples_experiment[samples_experiment$subject == "014", ]$m3_resid,
    col = "red")
# 015
qqnorm(samples_experiment[samples_experiment$subject == "015", ]$m3_resid)
qqline(samples_experiment[samples_experiment$subject == "015", ]$m3_resid,
    col = "red")

Ex2.p1.i

These data show a significant tail with some reaction times going far beyond the mean (e.g. 15 seconds +).

m3l <- lmer(log(rt.obj) ~ (1 | subject),
    data = samples_experiment)
summary(m3l)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: log(rt.obj) ~ (1 | subject)
##    Data: samples_experiment
## 
## REML criterion at convergence: 4775.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -8.4491 -0.4344  0.0168  0.4933  6.4941 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 0.08853  0.2975  
##  Residual             0.92008  0.9592  
## Number of obs: 1728, groups:  subject, 4
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)  
## (Intercept)  -0.3916     0.1505  3.0000  -2.601   0.0803 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
samples_experiment$m3l_resid <- residuals(m3l)
# plot qq plots for log transformed objective response time of subjects:
# 002
qqnorm(log(samples_experiment[samples_experiment$subject == "002", ]$m3l_resid))
## Warning in log(samples_experiment[samples_experiment$subject == "002", ]
## $m3l_resid): NaNs produced
qqline(log(samples_experiment[samples_experiment$subject == "002", ]$m3l_resid),
    col = "red")
## Warning in log(samples_experiment[samples_experiment$subject == "002", ]
## $m3l_resid): NaNs produced

# 009
qqnorm(log(samples_experiment[samples_experiment$subject == "009", ]$m3l_resid))
## Warning in log(samples_experiment[samples_experiment$subject == "009", ]
## $m3l_resid): NaNs produced
qqline(log(samples_experiment[samples_experiment$subject == "009", ]$m3l_resid),
    col = "red")
## Warning in log(samples_experiment[samples_experiment$subject == "009", ]
## $m3l_resid): NaNs produced

# 014
qqnorm(log(samples_experiment[samples_experiment$subject == "014", ]$m3l_resid))
## Warning in log(samples_experiment[samples_experiment$subject == "014", ]
## $m3l_resid): NaNs produced
qqline(log(samples_experiment[samples_experiment$subject == "014", ]$m3l_resid),
    col = "red")
## Warning in log(samples_experiment[samples_experiment$subject == "014", ]
## $m3l_resid): NaNs produced

# 015
qqnorm(log(samples_experiment[samples_experiment$subject == "015", ]$m3l_resid))
## Warning in log(samples_experiment[samples_experiment$subject == "015", ]
## $m3l_resid): NaNs produced
qqline(log(samples_experiment[samples_experiment$subject == "015", ]$m3l_resid),
    col = "red")
## Warning in log(samples_experiment[samples_experiment$subject == "015", ]
## $m3l_resid): NaNs produced

# reset plot grid
par(mfrow = c(1, 1))

Ex2.p1.ii

Log-transforming the data definitely creates a better fit, as can be seen in the qq-plots.

Exercise 2, part 2 - VK

m4 <- lmer(rt.obj ~ task + (1 | subject),
    REML = FALSE, data = samples_experiment)
summary(m4)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: rt.obj ~ task + (1 | subject)
##    Data: samples_experiment
## 
##      AIC      BIC   logLik deviance df.resid 
##  11780.8  11808.0  -5885.4  11770.8     1723 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -0.258 -0.111 -0.058 -0.010 39.780 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept)  0.09096 0.3016  
##  Residual             53.12390 7.2886  
## Number of obs: 1728, groups:  subject, 4
## 
## Fixed effects:
##                 Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)       1.6487     0.3391   18.4270   4.863 0.000117 ***
## taskquadruplet   -0.6554     0.4295 1724.0000  -1.526 0.127187    
## tasksingles      -0.5736     0.4295 1724.0000  -1.336 0.181883    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) tskqdr
## taskqudrplt -0.633       
## tasksingles -0.633  0.500
piecewiseSEM::rsquared(m4)
##   Response   family     link method    Marginal Conditional
## 1   rt.obj gaussian identity   none 0.001596197  0.00330281

The above model uses random intercepts for subject. We considered that task and trial might be good additional random effects, but these result in a singular fit. We considered task to be a relevant random effect, as it was expected that the different tasks would influence the reaction time. We considered trial to be a relevant random effect as participants might perform better throughout trials.

Overall this model does not explain much of the variance (0.33% about 0.16% fixed effects and 0.17% random effects). The model shows no statistically significant difference between response times per task.

Exercise 2, part 3 - LR

# model with PAS-task interaction.
m5 <- lmer(rt.obj ~ task + pas:task + (1 | subject),
    REML = FALSE, data = samples_experiment)
summary(m5)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: rt.obj ~ task + pas:task + (1 | subject)
##    Data: samples_experiment
## 
##      AIC      BIC   logLik deviance df.resid 
##  11796.6  11872.9  -5884.3  11768.6     1714 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -0.310 -0.100 -0.054 -0.005 39.752 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept)  0.08247 0.2872  
##  Residual             53.06137 7.2843  
## Number of obs: 1728, groups:  subject, 4
## 
## Fixed effects:
##                      Estimate Std. Error        df t value Pr(>|t|)
## (Intercept)            0.8696     1.7248 1487.2418   0.504    0.614
## taskquadruplet        -0.2514     2.5048 1727.9918  -0.100    0.920
## tasksingles           -0.2318     2.1467 1726.1882  -0.108    0.914
## taskpairs:pas3         0.4921     1.9054 1726.6724   0.258    0.796
## taskquadruplet:pas3    0.3666     2.0816 1720.4025   0.176    0.860
## tasksingles:pas3       0.3560     1.5592 1727.9856   0.228    0.819
## taskpairs:pas2         0.4343     1.7897 1726.8105   0.243    0.808
## taskquadruplet:pas2    0.5845     1.8898 1663.2112   0.309    0.757
## tasksingles:pas2       0.3804     1.3794 1675.4298   0.276    0.783
## taskpairs:pas1         1.1935     1.7775 1717.3364   0.671    0.502
## taskquadruplet:pas1    0.2221     1.8775 1711.0815   0.118    0.906
## tasksingles:pas1       0.5702     1.3781 1486.2328   0.414    0.679
## 
## Correlation of Fixed Effects:
##             (Intr) tskqdr tsksng tskp:3 tskq:3 tsks:3 tskp:2 tskq:2 tsks:2
## taskqudrplt -0.682                                                        
## tasksingles -0.796  0.548                                                 
## taskprs:ps3 -0.898  0.617  0.720                                          
## tskqdrplt:3  0.000 -0.638  0.000  0.001                                   
## tsksngls:p3 -0.001  0.001 -0.496  0.000 -0.001                            
## taskprs:ps2 -0.957  0.657  0.767  0.865  0.001  0.000                     
## tskqdrplt:2 -0.003 -0.703  0.000  0.001  0.847 -0.001  0.003              
## tsksngls:p2 -0.004  0.000 -0.561  0.001  0.001  0.772  0.003  0.005       
## taskprs:ps1 -0.964  0.662  0.772  0.871  0.000  0.002  0.929  0.003  0.005
## tskqdrplt:1 -0.002 -0.707  0.000  0.001  0.852  0.000  0.002  0.940  0.004
## tsksngls:p1 -0.005  0.001 -0.563  0.002 -0.001  0.776  0.003  0.003  0.880
##             tskp:1 tskq:1
## taskqudrplt              
## tasksingles              
## taskprs:ps3              
## tskqdrplt:3              
## tsksngls:p3              
## taskprs:ps2              
## tskqdrplt:2              
## tsksngls:p2              
## taskprs:ps1              
## tskqdrplt:1  0.002       
## tsksngls:p1  0.007  0.004

Ex2.p3.i

Any additional random effects we added besides subject resulted in singular fits.

# here already it becomes a singular fit
m6 <- lmer(rt.obj ~ task + pas:task + (1 | subject) + (1 | task),
    REML = FALSE, data = samples_experiment)
## boundary (singular) fit: see ?isSingular
summary(m6)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: rt.obj ~ task + pas:task + (1 | subject) + (1 | task)
##    Data: samples_experiment
## 
##      AIC      BIC   logLik deviance df.resid 
##  11798.6  11880.4  -5884.3  11768.6     1713 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -0.310 -0.100 -0.054 -0.005 39.752 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. 
##  subject  (Intercept) 8.247e-02 2.872e-01
##  task     (Intercept) 1.268e-14 1.126e-07
##  Residual             5.306e+01 7.284e+00
## Number of obs: 1728, groups:  subject, 4; task, 3
## 
## Fixed effects:
##                      Estimate Std. Error        df t value Pr(>|t|)
## (Intercept)            0.8696     1.7248 1487.2418   0.504    0.614
## taskquadruplet        -0.2514     2.5048 1727.9918  -0.100    0.920
## tasksingles           -0.2318     2.1467 1726.1882  -0.108    0.914
## taskpairs:pas3         0.4921     1.9054 1726.6724   0.258    0.796
## taskquadruplet:pas3    0.3666     2.0816 1720.4025   0.176    0.860
## tasksingles:pas3       0.3560     1.5592 1727.9856   0.228    0.819
## taskpairs:pas2         0.4343     1.7897 1726.8105   0.243    0.808
## taskquadruplet:pas2    0.5845     1.8898 1663.2112   0.309    0.757
## tasksingles:pas2       0.3804     1.3794 1675.4298   0.276    0.783
## taskpairs:pas1         1.1935     1.7775 1717.3364   0.671    0.502
## taskquadruplet:pas1    0.2221     1.8775 1711.0815   0.118    0.906
## tasksingles:pas1       0.5702     1.3781 1486.2328   0.414    0.679
## 
## Correlation of Fixed Effects:
##             (Intr) tskqdr tsksng tskp:3 tskq:3 tsks:3 tskp:2 tskq:2 tsks:2
## taskqudrplt -0.682                                                        
## tasksingles -0.796  0.548                                                 
## taskprs:ps3 -0.898  0.617  0.720                                          
## tskqdrplt:3  0.000 -0.638  0.000  0.001                                   
## tsksngls:p3 -0.001  0.001 -0.496  0.000 -0.001                            
## taskprs:ps2 -0.957  0.657  0.767  0.865  0.001  0.000                     
## tskqdrplt:2 -0.003 -0.703  0.000  0.001  0.847 -0.001  0.003              
## tsksngls:p2 -0.004  0.000 -0.561  0.001  0.001  0.772  0.003  0.005       
## taskprs:ps1 -0.964  0.662  0.772  0.871  0.000  0.002  0.929  0.003  0.005
## tskqdrplt:1 -0.002 -0.707  0.000  0.001  0.852  0.000  0.002  0.940  0.004
## tsksngls:p1 -0.005  0.001 -0.563  0.002 -0.001  0.776  0.003  0.003  0.880
##             tskp:1 tskq:1
## taskqudrplt              
## tasksingles              
## taskprs:ps3              
## tskqdrplt:3              
## tsksngls:p3              
## taskprs:ps2              
## tskqdrplt:2              
## tsksngls:p2              
## taskprs:ps1              
## tskqdrplt:1  0.002       
## tsksngls:p1  0.007  0.004
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
print(VarCorr(m6), comp = "Variance")
##  Groups   Name        Variance  
##  subject  (Intercept) 8.2468e-02
##  task     (Intercept) 1.2681e-14
##  Residual             5.3061e+01

Ex2.p3.iii. in your own words - how could you explain why your model would result in a singular fit?

In the example above, model m6, it results in a singular fit. Singular fits occur when the variances of some of the linear combinations of the effects are either zero or close to zero, and it is indicating that the model may be overfitting for the data. In model m6, we get the variance for task as 0, which results in a singular fit - this indicates to us that this model is overfitted to the data.

We expect that the reason that the model results in a singular fit is because we have too many parameters, and is therefore too too complex. As mentioned above, adding any additional random effects besides subject resulted in getting singular fits, so in order for the example above, m6, to not result in a singular fit, we would have to remove (1|Task).

Exercise 3

  1. Initialise a new data frame, data.count. count should indicate the number of times they categorized their experience as pas 1-4 for each task. I.e. the data frame would have for subject 1: for task:singles, pas1 was used # times, pas2 was used # times, pas3 was used # times and pas4 was used # times. You would then do the same for task:pairs and task:quadruplet
  2. Now fit a multilevel model that models a unique “slope” for pas for each subject with the interaction between pas and task and their main effects being modelled
    1. which family should be used?
    2. why is a slope for pas not really being modelled?
    3. if you get a convergence error, try another algorithm (the default is the Nelder_Mead) - try (bobyqa) for which the dfoptim package is needed. In glmer, you can add the following for the control argument: glmerControl(optimizer="bobyqa") (if you are interested, also have a look at the function allFit)
    4. when you have a converging fit - fit a model with only the main effects of pas and task. Compare this with the model that also includes the interaction
    5. indicate which of the two models, you would choose and why
    6. based on your chosen model - write a short report on what this says about the distribution of ratings as dependent on pas and task
    7. include a plot that shows the estimated amount of ratings for four subjects of your choosing
  3. Finally, fit a multilevel model that models correct as dependent on task with a unique intercept for each subject
    1. does task explain performance?
    2. add pas as a main effect on top of task - what are the consequences of that?
    3. now fit a multilevel model that models correct as dependent on pas with a unique intercept for each subject
    4. finally, fit a model that models the interaction between task and pas and their main effects
    5. describe in your words which model is the best in explaining the variance in accuracy

Answers

Exercise 3, part 1 - KV

data.count <- samples %>% select(subject, pas, task, correct) %>%
    group_by(subject, task, pas) %>%
    summarize(
        subject = subject[1],
        task = task[1],
        pas = pas[1],
        count = n(),
        accuracy = sum(correct) / n(),
        .groups = "drop")
head(data.count)
## # A tibble: 6 x 5
##   subject task       pas   count accuracy
##   <fct>   <fct>      <fct> <int>    <dbl>
## 1 001     pairs      4        12    1    
## 2 001     pairs      3         4    1    
## 3 001     pairs      2        45    0.756
## 4 001     pairs      1       109    0.532
## 5 001     quadruplet 4         1    1    
## 6 001     quadruplet 3         3    1

Exercise 3, part 2 EH

# multi-level with slope and interaction
m7 <- glmer(count ~
    pas + task + pas:task +
    (1 + pas | subject),
    data = data.count,
    family = poisson,
    control = glmerControl(optimizer = "bobyqa")) #added control, ex3.p2.iii
summary(m7)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: count ~ pas + task + pas:task + (1 + pas | subject)
##    Data: data.count
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   3148.4   3232.7  -1552.2   3104.4      318 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3871 -0.7853 -0.0469  0.7550  6.5438 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev. Corr             
##  subject (Intercept) 1.172    1.083                     
##          pas3        1.146    1.071    -0.80            
##          pas2        1.956    1.398    -0.97  0.79      
##          pas1        2.374    1.541    -0.96  0.70  0.92
## Number of obs: 340, groups:  subject, 29
## 
## Fixed effects:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          3.262772   0.206184  15.825  < 2e-16 ***
## pas3                 0.259281   0.206702   1.254  0.20971    
## pas2                 0.749155   0.264869   2.828  0.00468 ** 
## pas1                 0.772932   0.290752   2.658  0.00785 ** 
## taskquadruplet      -0.100104   0.041920  -2.388  0.01694 *  
## tasksingles          0.332516   0.037868   8.781  < 2e-16 ***
## pas3:taskquadruplet  0.005989   0.059782   0.100  0.92020    
## pas2:taskquadruplet  0.101249   0.053853   1.880  0.06010 .  
## pas1:taskquadruplet  0.215004   0.052299   4.111 3.94e-05 ***
## pas3:tasksingles    -0.320468   0.056111  -5.711 1.12e-08 ***
## pas2:tasksingles    -0.368099   0.050973  -7.221 5.15e-13 ***
## pas1:tasksingles    -0.563462   0.051014 -11.045  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) pas3   pas2   pas1   tskqdr tsksng ps3:tskq ps2:tskq
## pas3        -0.797                                                     
## pas2        -0.969  0.786                                              
## pas1        -0.959  0.702  0.911                                       
## taskqudrplt -0.105  0.104  0.081  0.074                                
## tasksingles -0.107  0.107  0.083  0.076  0.526                         
## ps3:tskqdrp  0.073 -0.142 -0.057 -0.052 -0.701 -0.369                  
## ps2:tskqdrp  0.081 -0.081 -0.103 -0.058 -0.778 -0.410  0.546           
## ps1:tskqdrp  0.084 -0.084 -0.065 -0.093 -0.802 -0.422  0.562    0.624  
## ps3:tsksngl  0.072 -0.146 -0.056 -0.051 -0.355 -0.675  0.506    0.276  
## ps2:tsksngl  0.079 -0.079 -0.104 -0.056 -0.391 -0.743  0.274    0.513  
## ps1:tsksngl  0.079 -0.079 -0.062 -0.091 -0.391 -0.742  0.274    0.304  
##             ps1:tskq ps3:tsks ps2:tsks
## pas3                                  
## pas2                                  
## pas1                                  
## taskqudrplt                           
## tasksingles                           
## ps3:tskqdrp                           
## ps2:tskqdrp                           
## ps1:tskqdrp                           
## ps3:tsksngl  0.285                    
## ps2:tsksngl  0.313    0.501           
## ps1:tsksngl  0.507    0.501    0.551

Ex3.p2.i

For this model (m7) we used the poisson family because the poisson distribution is specifically used to describe probabilities of event frequencies.

Ex3.p2.ii

There is not really a slope being modelled for PAS as PAS is not continuous, it is a factor, so we only get esimates for each PAS level. As a factor, PAS is treated categorically (PAS1, PAS2, PAS3, PAS4), there is no such thing as PAS 1.3 etc and as such, it can not really be modelled as a slope.

m7.1 <- glmer(count ~
    pas + task + (1 + pas | subject),
    data = data.count,
    family = poisson,
    control = glmerControl(optimizer = "bobyqa"))
summary(m7.1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: count ~ pas + task + (1 + pas | subject)
##    Data: data.count
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   3398.5   3459.8  -1683.3   3366.5      324 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.5885 -0.9001 -0.0477  0.8253  6.5100 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev. Corr             
##  subject (Intercept) 1.203    1.097                     
##          pas3        1.122    1.059    -0.80            
##          pas2        2.002    1.415    -0.97  0.80      
##          pas1        2.422    1.556    -0.96  0.71  0.92
## Number of obs: 340, groups:  subject, 29
## 
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    3.340760   0.207576  16.094   <2e-16 ***
## pas3           0.153914   0.201722   0.763   0.4455    
## pas2           0.657300   0.266004   2.471   0.0135 *  
## pas1           0.663828   0.291951   2.274   0.0230 *  
## taskquadruplet 0.003294   0.018188   0.181   0.8563    
## tasksingles    0.004307   0.018177   0.237   0.8127    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) pas3   pas2   pas1   tskqdr
## pas3        -0.802                            
## pas2        -0.971  0.799                     
## pas1        -0.961  0.714  0.917              
## taskqudrplt -0.047  0.004  0.003  0.002       
## tasksingles -0.044  0.000  0.000  0.000  0.501
AIC(m7, m7.1)
##      df      AIC
## m7   22 3148.441
## m7.1 16 3398.549

Ex3.p2.v Indicate which of the two models, you would choose and why?

Between m7 and m7.1, we believe that the m7 is the best model. Out of the two, m7 is the only one thats predictors have any significance on the outcome, in particular, the interactions in m7 have significant effect on the count variable. When comparing the AIC output, it is also shown that model m7 has a better fit, with an AIC of 3148.441, compared to m7.1 of 3398.549. Taking all this into consideration, we are inclined to use this model m7 out of the two as it is the better model.

Ex3.p2.vi based on your chosen model - write a short report on what this says about the distribution of ratings as dependent on pas and task ?

m7 tells us that there is an important interaction between pas and task when it comes to the distribution of ratings, as all interactions reach significance, p < 0.05, except for pas3:taskquadruplet, p = 0.92020 and pas2:taskquadruplet, p = 0.06010

We can see from m7 that there is a decrease of all the interactions between pass’s and quadruplet tasks (pas:quadruplet), est. -0.0719. We see the opposite in the case all counts of interactions between pass and single tasks, where there is an increase of est. ~0.169.

Looking at pas in m7, there is an increase of count in all pas when going from pas4 to pas1 (0.772932), 2 (est. 0.749155) and 3 (est. 0.259281). Only the predictions for pas1 and pas2 are significant, p < 0.05.

Regarding task in our model, count decreases going from task pairs to task quadruplets, est. -0.100, and count increases going from task pairs to task singles, ~ est. 0.333. Both of these predications are significant, p < 0.05.

# get only specified subjects
data.count_subset <- data.count %>%
    filter(
        subject == "002" |
        subject == "009" |
        subject == "014" |
        subject == "015")
# predict using model for only selected subjects
m7_fitted <- predict(m7, newdata = data.count_subset)
data.count_subset$count_estimate <- expm1(m7_fitted)

print(m7_fitted)
##        1        2        3        4        5        6        7        8 
## 1.746384 4.579301 4.326593 2.422614 4.580446 4.441493 1.758433 4.543718 
##        9       10       11       12       13       14       15       16 
## 4.095647 2.489337 3.508646 3.974888 4.875774 2.389234 3.414531 3.976033 
##       17       18       19       20       21       22       23       24 
## 4.990674 2.821854 3.520694 3.939305 4.644828 3.270353 3.749821 4.537376 
##       25       26       27       28       29       30       31       32 
## 4.070358 3.170249 3.655707 4.538521 4.185258 3.602869 3.761869 4.501793 
##       33       34       35       36       37       38       39       40 
## 3.839412 2.990216 3.697020 4.274627 4.402163 2.890113 3.602906 4.275772 
##       41       42       43       44       45 
## 4.517063 3.322733 3.709068 4.239044 4.171217
# plot
data.count_subset %>% ggplot(aes(pas, count, color = task)) + #smaller dots to see the original ratings
    geom_point() +
  theme_bw() +
    geom_point(aes(x = pas, y = count_estimate, color = task,), shape = 18, size = 3) + #larger dots are to see the estimated ratings over top
    facet_wrap(data.count_subset$subject)

Exercise 3, part 3 - LR

m8 <- glmer(correct ~ task +
    (1 | subject),
    data = samples,
    family = binomial(link = "logit"))
summary(m8)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ task + (1 | subject)
##    Data: samples
## 
##      AIC      BIC   logLik deviance df.resid 
##  19927.2  19958.4  -9959.6  19919.2    18127 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.7426 -1.0976  0.5098  0.6101  0.9111 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.1775   0.4214  
## Number of obs: 18131, groups:  subject, 29
## 
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     1.10071    0.08386  13.125  < 2e-16 ***
## taskquadruplet -0.09825    0.04190  -2.345    0.019 *  
## tasksingles     0.18542    0.04337   4.276 1.91e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) tskqdr
## taskqudrplt -0.256       
## tasksingles -0.247  0.495
# seems task only explains a tiny amount of performance
piecewiseSEM::rsquared(m8)
##   Response   family  link method    Marginal Conditional
## 1  correct binomial logit  delta 0.002526141  0.03489787

Ex3.p3.i

Differences in the task seem to explain performance (number of correct answers) to a statistically significant level.

m9 <- glmer(correct ~ task + pas +
    (1 | subject),
    data = samples,
    family = binomial(link = "logit"))
summary(m9)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ task + pas + (1 | subject)
##    Data: samples
## 
##      AIC      BIC   logLik deviance df.resid 
##  17424.9  17479.5  -8705.5  17410.9    18124 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -8.4872 -0.6225  0.3240  0.5767  1.6144 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.1979   0.4449  
## Number of obs: 18131, groups:  subject, 29
## 
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     3.21262    0.11772  27.290   <2e-16 ***
## taskquadruplet -0.03055    0.04497  -0.679    0.497    
## tasksingles    -0.01059    0.04687  -0.226    0.821    
## pas3           -1.15022    0.09368 -12.279   <2e-16 ***
## pas2           -2.17254    0.08623 -25.195   <2e-16 ***
## pas1           -3.12732    0.08628 -36.248   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) tskqdr tsksng pas3   pas2  
## taskqudrplt -0.190                            
## tasksingles -0.233  0.489                     
## pas3        -0.557  0.004  0.043              
## pas2        -0.625 -0.006  0.059  0.764       
## pas1        -0.637 -0.017  0.079  0.764  0.869
# better
piecewiseSEM::rsquared(m9)
##   Response   family  link method  Marginal Conditional
## 1  correct binomial logit  delta 0.2001362   0.2289711

Ex3.p3.ii

When we add PAS as a main effect on top of task, we can see that PAS actually has a significant effect on performance, whereas now task no longer displays a significant effect. This means that by including PAS as a main effect in the model, it altered the significance of task on performance.

m10 <- glmer(correct ~ pas +
    (1 | subject),
    data = samples,
    family = binomial(link = "logit"))
summary(m10)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ pas + (1 | subject)
##    Data: samples
## 
##      AIC      BIC   logLik deviance df.resid 
##  17421.4  17460.4  -8705.7  17411.4    18126 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -8.5665 -0.6243  0.3244  0.5754  1.6017 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.1981   0.4451  
## Number of obs: 18131, groups:  subject, 29
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.19985    0.11404   28.06   <2e-16 ***
## pas3        -1.15049    0.09358  -12.29   <2e-16 ***
## pas2        -2.17365    0.08601  -25.27   <2e-16 ***
## pas1        -3.12940    0.08582  -36.46   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr) pas3   pas2  
## pas3 -0.567              
## pas2 -0.637  0.764       
## pas1 -0.647  0.764  0.868
# task makes negligible diff
piecewiseSEM::rsquared(m10)
##   Response   family  link method  Marginal Conditional
## 1  correct binomial logit  delta 0.2001568   0.2290084
m11 <- glm(correct ~ pas + task +
    pas:task,
    data = samples,
    family = binomial(link = "logit"))
summary(m11)
## 
## Call:
## glm(formula = correct ~ pas + task + pas:task, family = binomial(link = "logit"), 
##     data = samples)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5349  -1.2539   0.5089   0.7922   1.1028  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          2.97483    0.13461  22.100  < 2e-16 ***
## pas3                -1.04172    0.16112  -6.465 1.01e-10 ***
## pas2                -1.97686    0.14502 -13.631  < 2e-16 ***
## pas1                -2.76473    0.14218 -19.445  < 2e-16 ***
## taskquadruplet      -0.12316    0.18907  -0.651    0.515    
## tasksingles          0.19699    0.18347   1.074    0.283    
## pas3:taskquadruplet  0.01682    0.22714   0.074    0.941    
## pas2:taskquadruplet  0.07579    0.20373   0.372    0.710    
## pas1:taskquadruplet  0.12540    0.19928   0.629    0.529    
## pas3:tasksingles    -0.15153    0.22253  -0.681    0.496    
## pas2:tasksingles    -0.15565    0.19913  -0.782    0.434    
## pas1:tasksingles    -0.22907    0.19593  -1.169    0.242    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 20418  on 18130  degrees of freedom
## Residual deviance: 17853  on 18119  degrees of freedom
## AIC: 17877
## 
## Number of Fisher Scoring iterations: 5
piecewiseSEM::rsquared(m11)
##   Response   family  link     method R.squared
## 1  correct binomial logit nagelkerke       NaN
m12 <- glmer(correct ~ pas + task +
    pas:task + (1 | subject),
    data = samples,
    family = binomial(link = "logit"))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0036472 (tol = 0.002, component 1)
summary(m12)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ pas + task + pas:task + (1 | subject)
##    Data: samples
## 
##      AIC      BIC   logLik deviance df.resid 
##  17431.0  17532.4  -8702.5  17405.0    18118 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -7.9339 -0.6276  0.3186  0.5750  1.6411 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.1987   0.4458  
## Number of obs: 18131, groups:  subject, 29
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          3.13206    0.15977  19.604  < 2e-16 ***
## pas3                -1.04839    0.16303  -6.430 1.27e-10 ***
## pas2                -2.08499    0.14809 -14.080  < 2e-16 ***
## pas1                -3.04828    0.14599 -20.881  < 2e-16 ***
## taskquadruplet      -0.08775    0.18953  -0.463    0.643    
## tasksingles          0.23912    0.18410   1.299    0.194    
## pas3:taskquadruplet -0.03995    0.22823  -0.175    0.861    
## pas2:taskquadruplet  0.04490    0.20484   0.219    0.827    
## pas1:taskquadruplet  0.09419    0.20023   0.470    0.638    
## pas3:tasksingles    -0.21805    0.22376  -0.974    0.330    
## pas2:tasksingles    -0.25772    0.20058  -1.285    0.199    
## pas1:tasksingles    -0.29812    0.19720  -1.512    0.131    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) pas3   pas2   pas1   tskqdr tsksng ps3:tskq ps2:tskq
## pas3        -0.712                                                     
## pas2        -0.792  0.773                                              
## pas1        -0.808  0.783  0.874                                       
## taskqudrplt -0.601  0.590  0.650  0.658                                
## tasksingles -0.619  0.608  0.670  0.678  0.522                         
## ps3:tskqdrp  0.499 -0.705 -0.540 -0.547 -0.831 -0.433                  
## ps2:tskqdrp  0.557 -0.546 -0.702 -0.610 -0.926 -0.483  0.769           
## ps1:tskqdrp  0.569 -0.559 -0.616 -0.699 -0.947 -0.494  0.786    0.876  
## ps3:tsksngl  0.507 -0.718 -0.548 -0.555 -0.429 -0.823  0.512    0.397  
## ps2:tsksngl  0.566 -0.558 -0.715 -0.618 -0.479 -0.919  0.398    0.517  
## ps1:tsksngl  0.576 -0.567 -0.623 -0.707 -0.487 -0.934  0.404    0.451  
##             ps1:tskq ps3:tsks ps2:tsks
## pas3                                  
## pas2                                  
## pas1                                  
## taskqudrplt                           
## tasksingles                           
## ps3:tskqdrp                           
## ps2:tskqdrp                           
## ps1:tskqdrp                           
## ps3:tsksngl  0.406                    
## ps2:tsksngl  0.454    0.756           
## ps1:tsksngl  0.517    0.768    0.858  
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.0036472 (tol = 0.002, component 1)
piecewiseSEM::rsquared(m12)
##   Response   family  link method Marginal Conditional
## 1  correct binomial logit  delta 0.201629   0.2305216
AIC(m8, m9, m10, m12)
##     df      AIC
## m8   4 19927.21
## m9   7 17424.91
## m10  5 17421.39
## m12 13 17430.97

Ex3.p3.v Describe in your words which model is the best in explaining the variance in accuracy?

Of the above models, m12 is the best at explaining the variance in accuracy. But we think that model 10 is a bettter model to use because the difference in explained variance is negligible and it is a simpler model which means it is less prone to overfitting. Adding the random intercepts per subject allows for differences in subjects ability to correctly classify the stimuli. These difference in subjects may come from situational stress. Model m10 also has the lowest AIC score (17421.39), even though it has slightly more residual deviance than model 12, but this makes sense as more complex models are punished in this area.

Portfolio Assignment 2 part 2, Methods 3, 2021, autumn semester

Portfolio Assignment 2 part 2, Methods 3, 2021, autumn semester

EXERCISE 4 - Download and organise the data from experiment 1

Go to https://osf.io/ecxsj/files/ and download the files associated with Experiment 1 (there should be 29).
The data is associated with Experiment 1 of the article at the following DOI https://doi.org/10.1016/j.concog.2019.03.007

  1. Put the data from all subjects into a single data frame - note that some of the subjects do not have the seed variable. For these subjects, add this variable and make in NA for all observations. (The seed variable will not be part of the analysis and is not an experimental variable)
    1. Factorise the variables that need factorising
    2. Remove the practice trials from the dataset (see the trial.type variable)
    3. Create a correct variable
      . iv. Describe how the target.contrast and target.frames variables differ compared to the data from part 1 of this assignment

Answers

Exercise 4, part 1 - EH

#loading in the data
dat <- list.files(path = "./experiment_1", full.names = TRUE)


#Changing all the variables classes to the ones we want them to be 
samples <- map_df(dat, read_csv,
    trim_ws = TRUE, na = c("", "NA"), # i
    col_types = cols(
        trial.type = col_factor(),
        pas = col_factor(),
        trial = col_factor(),
        jitter.x = col_double(),
        jitter.y = col_double(),
        odd.digit = col_integer(),
        target.contrast = col_double(),
        target.frames = col_double(),
        cue = col_factor(),
        task = col_factor(),
        target.type = col_factor(),
        rt.subj = col_double(),
        rt.obj = col_double(),
        even.digit = col_integer(),
        seed = col_double(),
        obj.resp = col_factor(),
        subject = col_factor()
    )
)
## Warning: The following named parsers don't match the column names: seed

## Warning: The following named parsers don't match the column names: seed

## Warning: The following named parsers don't match the column names: seed
# removing all the practice trials #ii
exp1 <- samples[samples$trial.type == "experiment", ]
exp1 <- mutate(exp1, # iii
    correct = as.logical(
        ifelse(substr(target.type, 1, 1) == obj.resp, 1, 0)
    )
)
  1. In the previous experiment, the number of target frames was fixed at 3, whereas this time it the target frames was not fixed (they were integer numbers from 1-6), and in the last experiment where target.contrast was not fixed but for this experiment target.contrast was fixed at 0.1

EXERCISE 5 - Use log-likelihood ratio tests to evaluate logistic regression models

  1. Do logistic regression - correct as the dependent variable and target.frames as the independent variable. (Make sure that you understand what target.frames encode). Create two models - a pooled model and a partial-pooling model. The partial-pooling model should include a subject-specific intercept.

    1. the likelihood-function for logistic regression is: L(p)=i=1Npyi(1p)(1yi) (Remember the probability mass function for the Bernoulli Distribution). Create a function that calculates the likelihood.
    2. the log-likelihood-function for logistic regression is: l(p)=i=1N[yilnp+(1yi)ln(1p). Create a function that calculates the log-likelihood
    3. apply both functions to the pooling model you just created. Make sure that the log-likelihood matches what is returned from the logLik function for the pooled model. Does the likelihood-function return a value that is surprising? Why is the log-likelihood preferable when working with computers with limited precision?
    4. now show that the log-likelihood is a little off when applied to the partial pooling model - (the likelihood function is different for the multilevel function - see section 2.1 of https://www.researchgate.net/profile/Douglas-Bates/publication/2753537_Computational_Methods_for_Multilevel_Modelling/links/00b4953b4108d73427000000/Computational-Methods-for-Multilevel-Modelling.pdf if you are interested)
  2. Use log-likelihood ratio tests to argue for the addition of predictor variables, start from the null model, glm(correct ~ 1, 'binomial', data), then add subject-level intercepts, then add a group-level effect of target.frames and finally add subject-level slopes for target.frames. Also assess whether or not a correlation between the subject-level slopes and the subject-level intercepts should be included.

    1. write a short methods section and a results section where you indicate which model you chose and the statistics relevant for that choice. Include a plot of the estimated group-level function with xlim=c(0, 8) that includes the estimated subject-specific functions.
    2. also include in the results section whether the fit didn’t look good for any of the subjects. If so, identify those subjects in the report, and judge (no statistical test) whether their performance (accuracy) differed from that of the other subjects. Was their performance better than chance? (Use a statistical test this time) (50 %)
  3. Now add pas to the group-level effects - if a log-likelihood ratio test justifies this, also add the interaction between pas and target.frames and check whether a log-likelihood ratio test justifies this

    1. if your model doesn’t converge, try a different optimizer

    2. plot the estimated group-level functions over xlim=c(0, 8) for each of the four PAS-ratings - add this plot to your report (see: 5.2.i) and add a description of your chosen model. Describe how pas affects accuracy together with target duration if at all. Also comment on the estimated functions’ behaviour at target.frame=0 - is that behaviour reasonable?

Answers

Exercise 5, part 1 - KV

#creating a pooled model 
m1 <- glm(correct ~ target.frames, data = exp1,
    family = binomial(link = "logit"))
summary(m1)
## 
## Call:
## glm(formula = correct ~ target.frames, family = binomial(link = "logit"), 
##     data = exp1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6452   0.2478   0.3546   0.7039   1.2621  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.92992    0.03485  -26.69   <2e-16 ***
## target.frames  0.73296    0.01219   60.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 26683  on 25043  degrees of freedom
## Residual deviance: 21730  on 25042  degrees of freedom
## AIC: 21734
## 
## Number of Fisher Scoring iterations: 5
#creating a partial-pooled model 
m2 <- glmer(correct ~ target.frames + (1 | subject),
    data = exp1, family = binomial(link = "logit"))
summary(m2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ target.frames + (1 | subject)
##    Data: exp1
## 
##      AIC      BIC   logLik deviance df.resid 
##  21250.1  21274.4 -10622.0  21244.1    25041 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -7.7520  0.1436  0.2604  0.4816  2.0730 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.1549   0.3936  
## Number of obs: 25044, groups:  subject, 29
## 
## Fixed effects:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.95968    0.08125  -11.81   <2e-16 ***
## target.frames  0.75493    0.01251   60.37   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## target.frms -0.382
#5.i creating our own likelihood function
likelihood <- function(p, y) {
    prod((p^y) * (1 - p)^(1 - y))
}

#5.ii creating our own log-likelihood function 
log_likelihood <- function(p, y) {
    sum(y * log(p) + (1 - y) * log(1 - p))
}

#5.iii Getting some likelihood and log-likelihood values and double-checking that our function spits out the right values for m1
p <- fitted.values(m1)
likelihood(p, exp1$correct)
## [1] 0
log_likelihood(p, exp1$correct)
## [1] -10865.25
(m1_ll <- logLik(m1))
## 'log Lik.' -10865.25 (df=2)
# 5.iv Showing that the log-likelihood is slightly different when using _LogLik_ in comparison to our own function for m2 
p <- fitted.values(m2)
likelihood(p, exp1$correct)
## [1] 0
log_likelihood(p, exp1$correct)
## [1] -10565.53
(m2_ll <- logLik(m2))
## 'log Lik.' -10622.03 (df=3)

5.iii a likelihood value of 0 is returned for both m1 and m2, which is due to the lack of precision of the computer. This is why it is preferable to use the log-likelihood function instead.

5.iv The log-likelihood value returned from our function is -10565.53 whereas the value for the LogLik function is 10622.03, which can probably be explained by the fact that our function does not take multilevel modelling into considaration.

Exercise 5, part 2 - LR

#creating the null-model 
m0 <- glm(correct ~ 1, data = exp1, family = binomial(link = "logit"))


#adding subject-level intercepts 
m3 <- glmer(correct ~ 1 + (1 | subject),
    data = exp1, family = binomial(link = "logit"))

#adding subject-level intercepts and slopes 
m4 <- glmer(correct ~ target.frames + (target.frames | subject),
    data = exp1, family = binomial(link = "logit"))


#checking the log-likelihood of the different models 
(m0_ll <- logLik(m0))
## 'log Lik.' -13341.54 (df=1)
(m1_ll <- logLik(m1))
## 'log Lik.' -10865.25 (df=2)
(m2_ll <- logLik(m2))
## 'log Lik.' -10622.03 (df=3)
(m3_ll <- logLik(m3))
## 'log Lik.' -13157.55 (df=2)
(m4_ll <- logLik(m4))
## 'log Lik.' -10448.83 (df=5)
#creating some great plots to investigate the data 
ggplot(exp1, aes(target.frames, as.numeric(correct), color = correct)) +
    geom_point() +
    geom_point(aes(target.frames, fitted.values(m4)),
        shape = 3, color = "black", inherit.aes = FALSE) +
    facet_wrap(~subject) +
    labs(
        x = "Target frames",
        y = "Accuracy",
        title = "Logistic regression model")

#zooming in on subject 024 
(subj24_accuracy <- exp1 %>%
    filter(subject == "024") %>%
    summarise(accuracy = sum(correct) / n()))
## # A tibble: 1 x 1
##   accuracy
##      <dbl>
## 1    0.568

2.i+ii We have run binomial regression models, in which we tested the relationship between the variable correct predicted by target.frames. Different models using fixed and and random effects were created for comparison. The quality of the models Was compared by using the log-likelihood function.

Out of the 5 models, we choose m4 as our best model, as it has the highest LogLik value of -10676.02, with 5 degrees of freedom. After observing the plots, we have identified subject number 024 as having a bad fit. As the fit looks more linear than sigmoid, and has values all along the axis x = 0. Judging from this graph we predict that their performance is different to the other participants possibly having a more equal ratio of correct and incorrect, as they are the only subject that had a graph that resulted in looking more linear. We then took subject 024 and took the number of correct divided by trials to retrieve their accuracy score and compare it to chance, 50%. They have a accuracy percentage of 56.9%, which is only slightly higher than chance, which could explain why their graph looks poor.

Exercise 5, part 3 - VK

#Creating some models 
m5 <- glmer(correct ~ target.frames + pas + (target.frames | subject),
    data = exp1, family = binomial(link = "logit"))
(m5_ll <- logLik(m5))
## 'log Lik.' -9931.828 (df=8)
m6 <- glmer(correct ~ target.frames + pas +
    (pas * target.frames) + (target.frames | subject),
    data = exp1, family = binomial(link = "logit"), control = glmerControl(optimizer = "bobyqa"))
(m6_ll <- logLik(m6))
## 'log Lik.' -9742.039 (df=11)
summary(m6)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## correct ~ target.frames + pas + (pas * target.frames) + (target.frames |  
##     subject)
##    Data: exp1
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##  19506.1  19595.5  -9742.0  19484.1    25033 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -19.0101   0.0537   0.1606   0.4849   1.4465 
## 
## Random effects:
##  Groups  Name          Variance Std.Dev. Corr 
##  subject (Intercept)   0.03698  0.1923        
##          target.frames 0.02057  0.1434   -0.76
## Number of obs: 25044, groups:  subject, 29
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         0.08009    0.24320   0.329  0.74193    
## target.frames       0.87404    0.06792  12.869  < 2e-16 ***
## pas3               -0.74022    0.27065  -2.735  0.00624 ** 
## pas2               -0.77311    0.25133  -3.076  0.00210 ** 
## pas1               -0.20172    0.24613  -0.820  0.41246    
## target.frames:pas3 -0.01055    0.07335  -0.144  0.88565    
## target.frames:pas2 -0.31206    0.06850  -4.555 5.23e-06 ***
## target.frames:pas1 -0.75924    0.06749 -11.250  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trgt.f pas3   pas2   pas1   trg.:3 trg.:2
## target.frms -0.896                                          
## pas3        -0.869  0.759                                   
## pas2        -0.945  0.826  0.847                            
## pas1        -0.966  0.842  0.859  0.936                     
## trgt.frms:3  0.775 -0.776 -0.926 -0.761 -0.769              
## trgt.frms:2  0.841 -0.841 -0.762 -0.921 -0.839  0.790       
## trgt.frms:1  0.853 -0.850 -0.767 -0.838 -0.916  0.791  0.870
interactions::interact_plot(model = m6, pred = "target.frames", modx = "pas")

The log-likelihood of m4, m5 and m6 are these:

m4_ll = -10676.02, df=(5) m5_11 = -9931.828, df=(8) m6_ll = -9742.039, df=(11)

Thereby justifying the choice of model 6 as our preferred model

5.3 ii The chosen model, m6, is a binomial model in which the variable correct is predicted by target.frames, PAS and their interaction.

The estimate of target.frames shows that an increase in this variable “generally” increases accuracy - however, let’s look at the intercations_

We use this interaction between PAS and target duration because the lesser amount of frames (target duration) the less confident participants are (PAS).

Together, PAS with time duration (target.frames), the correctness gets worse as target.frames:PAS decreases (from PAS_4 to PAS_1). It can be read from the model summary that target.frames influences correctness less with lower PAS level (e.g. target.frames:PAS_1 = -0.75924 etc.)

The intercept of 0.08009, does sound strange as one might incorrectly expect it to be around 50%, but naturally this is because it is not possible to show a figure at a duration of 0 frames <3 The intercept of 0.08 is thus only meaningful within the model but not in reality.

EXERCISE 6 - Test linear hypotheses

In this section we are going to test different hypotheses. We assume that we have already proved that more objective evidence (longer duration of stimuli) is sufficient to increase accuracy in and of itself and that more subjective evidence (higher PAS ratings) is also sufficient to increase accuracy in and of itself.

We want to test a hypothesis for each of the three neighbouring differences in PAS, i.e. the difference between 2 and 1, the difference between 3 and 2 and the difference between 4 and 3. More specifically, we want to test the hypothesis that accuracy increases faster with objective evidence if subjective evidence is higher at the same time, i.e. we want to test for an interaction.

  1. Fit a model based on the following formula: correct ~ pas * target.frames + (target.frames | subject))

    1. First, use summary (yes, you are allowed to!) to argue that accuracy increases faster with objective evidence for PAS 2 than for PAS 1.
  2. summary won’t allow you to test whether accuracy increases faster with objective evidence for PAS 3 than for PAS 2 (unless you use relevel, which you are not allowed to in this exercise). Instead, we’ll be using the function glht from the multcomp package

    1. To redo the test in 6.1.i, you can create a contrast vector. This vector will have the length of the number of estimated group-level effects and any specific contrast you can think of can be specified using this. For redoing the test from 6.1.i, the code snippet below will do

    2. Now test the hypothesis that accuracy increases faster with objective evidence for PAS 3 than for PAS 2.

    3. Also test the hypothesis that accuracy increases faster with objective evidence for PAS 4 than for PAS 3

  3. Finally, test that whether the difference between PAS 2 and 1 (tested in 6.1.i) is greater than the difference between PAS 4 and 3 (tested in 6.2.iii)

Answers

Exercise 6, part 1 - EH

#creating the desired model
m7 <- glmer(correct ~ pas * target.frames +
    (target.frames | subject), data = exp1,
    family = binomial(link = "logit"),
    control = glmerControl(optimizer = "bobyqa"))
#getting that summary 
summary(m7)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ pas * target.frames + (target.frames | subject)
##    Data: exp1
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##  19506.1  19595.5  -9742.0  19484.1    25033 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -19.0102   0.0537   0.1606   0.4849   1.4465 
## 
## Random effects:
##  Groups  Name          Variance Std.Dev. Corr 
##  subject (Intercept)   0.03698  0.1923        
##          target.frames 0.02057  0.1434   -0.76
## Number of obs: 25044, groups:  subject, 29
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         0.08010    0.24575   0.326  0.74446    
## pas3               -0.74025    0.27355  -2.706  0.00681 ** 
## pas2               -0.77313    0.25370  -3.047  0.00231 ** 
## pas1               -0.20174    0.24871  -0.811  0.41729    
## target.frames       0.87404    0.06850  12.760  < 2e-16 ***
## pas3:target.frames -0.01054    0.07403  -0.142  0.88678    
## pas2:target.frames -0.31205    0.06904  -4.520 6.19e-06 ***
## pas1:target.frames -0.75924    0.06809 -11.151  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) pas3   pas2   pas1   trgt.f ps3:t. ps2:t.
## pas3        -0.871                                          
## pas2        -0.946  0.850                                   
## pas1        -0.966  0.862  0.937                            
## target.frms -0.898  0.763  0.829  0.845                     
## ps3:trgt.fr  0.780 -0.927 -0.766 -0.773 -0.780              
## ps2:trgt.fr  0.843 -0.766 -0.923 -0.841 -0.844  0.794       
## ps1:trgt.fr  0.856 -0.771 -0.841 -0.917 -0.853  0.794  0.872

6.1.i we see that the interaction-effect between pas and target.frames is clearly higher for PAS_2 vs PAS_1, thus indicating that accuracy increases faster with objective evidence for PAS_2 than for PAS_1.

Exercise 6, part 2 - KV

#trying out glht on m7 
glht(m7)
## 
##   General Linear Hypotheses
## 
## Linear Hypotheses:
##                         Estimate
## (Intercept) == 0         0.08010
## pas3 == 0               -0.74025
## pas2 == 0               -0.77313
## pas1 == 0               -0.20174
## target.frames == 0       0.87404
## pas3:target.frames == 0 -0.01054
## pas2:target.frames == 0 -0.31205
## pas1:target.frames == 0 -0.75924

Snippet for 6.2.i+ii+iii

## testing whether PAS 2 is different from PAS 1
contrast.vector <- matrix(c(0, 0, -1, 1, 0, 0, 0, 0), nrow = 1)
gh <- glht(m7, contrast.vector)
print(summary(gh))
## as another example, we could also test whether there is a difference in
## intercepts between PAS 2 and PAS 3
contrast.vector <- matrix(c(0, -1, 1, 0, 0, 0, 0, 0), nrow = 1)
gh <- glht(m7, contrast.vector)
print(summary(gh))

## PAS 4 and 3
contrast.vector <- matrix(c(-1, 1, 0, 0, 0, 0, 0, 0), nrow = 1)
gh <- glht(m7, contrast.vector)
print(summary(gh))

Exercise 6, part 3 - LR

#creating the contrast_vector 
K <- rbind(
    c(0, 0, -1, 1),
    c(-1, 1, 0, 0)
)

gh <- glht(m7, mcp(pas = K))
print(summary(gh))

6.3.i - as seen in the output there is a greater difference between PAS 2 and 1 than between 4 and 3.

EXERCISE 7 - Estimate psychometric functions for the Perceptual Awareness Scale and evaluate them

We saw in 5.3 that the estimated functions went below chance at a target duration of 0 frames (0 ms). This does not seem reasonable, so we will be trying a different approach for fitting here.

We will fit the following function that results in a sigmoid, f(x)=a+ba1+ecxd
It has four parameters: a, which can be interpreted as the minimum accuracy level, b, which can be interpreted as the maximum accuracy level, c, which can be interpreted as the so-called inflexion point, i.e. where the derivative of the sigmoid reaches its maximum and d, which can be interpreted as the steepness at the inflexion point. (When d goes towards infinity, the slope goes towards a straight line, and when it goes towards 0, the slope goes towards a step function).

We can define a function of a residual sum of squares as below

sigfit <- function(a, b, c, d, x) {
    a + (b - a) / (1 + exp((c - x) / d))
}

RSS <- function(dataset, par) {
    x <- dataset$x
    y <- dataset$y
    y.hat <- sigfit(par[1], par[2], par[3], par[4], x)
    RSS <- sum((y - y.hat)^2)
    return(RSS)
}
  1. Now, we will fit the sigmoid for the four PAS ratings for Subject 7

    1. use the function optim. It returns a list that among other things contains the four estimated parameters. You should set the following arguments:
      par: you can set c and d as 1. Find good choices for a and b yourself (and argue why they are appropriate)
      fn: which function to minimise?
      data: the data frame with x, target.frames, and y, correct in it
      method: ‘L-BFGS-B’
      lower: lower bounds for the four parameters, (the lowest value they can take), you can set c and d as -Inf. Find good choices for a and b yourself (and argue why they are appropriate)
      upper: upper bounds for the four parameters, (the highest value they can take) can set c and d as Inf. Find good choices for a and b yourself (and argue why they are appropriate)

    2. Plot the fits for the PAS ratings on a single plot (for subject 7) xlim=c(0, 8)

    3. Create a similar plot for the PAS ratings on a single plot (for subject 7), but this time based on the model from 6.1 xlim=c(0, 8)

    4. Comment on the differences between the fits - mention some advantages and disadvantages of each way

  2. Finally, estimate the parameters for all subjects and each of their four PAS ratings. Then plot the estimated function at the group-level by taking the mean for each of the four parameters, a, b, c and d across subjects. A function should be estimated for each PAS-rating (it should look somewhat similar to Fig. 3 from the article: https://doi.org/10.1016/j.concog.2019.03.007)

    1. compare with the figure you made in 5.3.ii and comment on the differences between the fits - mention some advantages and disadvantages of both.

Answers

Exercise 7, part 1 - VK

#filtering out subject 7 
subj7 <- exp1[exp1$subject == "007", ]

#choosing our desired variables 
subj7xy <- data.frame(
    x = subj7$target.frames,
    y = subj7$correct,
    pas = subj7$pas)


#Setting up the function and using it on the four pas_levels


#PAS 1
s7p1 <- optim(
    par = c(0.5, 1, 1, 1), # we set a to be 0.5 as we expect chance level to be the appropriate minimum accuracy 
    fn = RSS,
    data = subj7xy[subj7xy$pas == "1", ],
    method = "L-BFGS-B",
    lower = c(0.5, 0.5, -Inf, -Inf),
    upper = c(1, 1, Inf, Inf)
)
p1fit <- sigfit(
    a = s7p1$par[1],
    b = s7p1$par[2],
    c = s7p1$par[3],
    d = s7p1$par[4],
    x = subj7xy[subj7xy$pas == "1", ]$x
)

#PAS 2
s7p2 <- optim(
    par = c(0.5, 1, 1, 1),
    fn = RSS,
    data = subj7xy[subj7xy$pas == "2", ],
    method = "L-BFGS-B",
    lower = c(0.5, 0.5, -Inf, -Inf),
    upper = c(1, 1, Inf, Inf)
)

p2fit <- sigfit(
    a = s7p2$par[1],
    b = s7p2$par[2],
    c = s7p2$par[3],
    d = s7p2$par[4],
    x = subj7xy[subj7xy$pas == "2", ]$x
)

#PAS 3
s7p3 <- optim(
    par = c(0.5, 1, 1, 1),
    fn = RSS,
    data = subj7xy[subj7xy$pas == "3", ],
    method = "L-BFGS-B",
    lower = c(0.5, 0.5, -Inf, -Inf),
    upper = c(1, 1, Inf, Inf)
)

p3fit <- sigfit(
    a = s7p3$par[1],
    b = s7p3$par[2],
    c = s7p3$par[3],
    d = s7p3$par[4],
    x = subj7xy[subj7xy$pas == "3", ]$x
)

#PAS 4 
s7p4 <- optim(
    par = c(0.5, 1, 1, 1),
    fn = RSS,
    data = subj7xy[subj7xy$pas == "4", ],
    method = "L-BFGS-B",
    lower = c(0.5, 0.5, -Inf, -Inf),
    upper = c(1, 1, Inf, Inf)
)

p4fit <- sigfit(
    a = s7p4$par[1],
    b = s7p4$par[2],
    c = s7p4$par[3],
    d = s7p4$par[4],
    x = subj7xy[subj7xy$pas == "4", ]$x
)


length(subj7xy[subj7xy$pas == "1", ]$x)
## [1] 183
length(p1fit)
## [1] 183
subj7xy$yhat <- NA
subj7xy[subj7xy$pas == "1", ]$yhat <- p1fit
subj7xy[subj7xy$pas == "2", ]$yhat <- p2fit
subj7xy[subj7xy$pas == "3", ]$yhat <- p3fit
subj7xy[subj7xy$pas == "4", ]$yhat <- p4fit

#plotting the fits of the sigmoid 
ggplot(aes(x = x, y = as.numeric(y), color = pas), data = subj7xy) +
    geom_jitter(width = 0.25, height = 0.05) +
    geom_line(aes(y = yhat)) +
    geom_point(aes(y = yhat), shape = 3, size = 3) +
    geom_text(aes(y = yhat, label = round(yhat, 2)),
        color = "black", nudge_x = -0.1, nudge_y = 0.005
    ) +
    labs(x = "Target frames", y = "Accuracy", title = "Subject 7")

s7m7fit <- predict(m7, newdata = subj7, type = "response")
 
#plotting m7 
ggplot(aes(x = target.frames, y = as.numeric(correct),color = pas),
    data = subj7) +
    geom_jitter(width = 0.25, height = 0.05) +
    geom_line(aes(y = s7m7fit))

7.1.iv As we plotted the predicted values of the fits, the intercepts at target.frames=0 is not shown, though this would have been a crucial difference between the plots.The visible difference between the plots must primarily stem from the fact that we manually set the lowest possible value to 0.5 in our optim function. The advantage of manually creating the function is that we can control for the fact that below chance level accuracy is highly unlikely and thus avoid an uninterpretable intercept, however we might be prone to bias as we set the parameters ourselves. On the other hand, using the binomial model might prone to less bias as we do not decide that any parameters, but might return (as in this case) values which are uninterpretable (as the intercept in this).It should however be noticed that the plot of m7 is based upon all the subject data whereas the “optim”-plot is only based on subject 7, thus making them not entirely comparable.

Exercise 7, part 2 - EH

sigmodel_pas <- function(pas_lev, subjx) {
    dat <- data.frame(
        x = subjx[subjx$pas == pas_lev, ]$target.frames,
        y = subjx[subjx$pas == pas_lev, ]$correct)
    subjxpx_opt <- optim(
        par = c(0.5, 1, 1, 1),
        fn = RSS,
        data = dat,
        method = "L-BFGS-B",
        lower = c(0.5, 0.5, -Inf, -Inf),
        upper = c(1, 1, Inf, Inf)
    )

    list(
        subjxpx_opt$par[1],
        subjxpx_opt$par[2],
        subjxpx_opt$par[3],
        subjxpx_opt$par[4]
    )
}

sigmodel <- function(x, dat, paslevs) {
    subjx <- dat[dat$subject == x, ]
    sapply(paslevs, sigmodel_pas, subjx = subjx,
        USE.NAMES = FALSE, simplify = FALSE)
}

fit_sigmodel <- function(pas, dat, params) {
    sigfit(
        a = params[, pas][1],
        b = params[, pas][2],
        c = params[, pas][3],
        d = params[, pas][4],
        x = dat[dat$pas == pas, ]$target.frames
    )
}

pars <- c("a", "b", "c", "d")
subjects <- levels(exp1$subject)
paslevs <- levels(exp1$pas)
N <- length(subjects)

sigmodel_params <- array(
    unlist(sapply(
        subjects,
        sigmodel,
        dat = exp1,
        paslevs = paslevs,
        USE.NAMES = FALSE,
        simplify = FALSE
    )),
    dim = c(4, 4, N), dimnames = list(pars, paslevs, subjects)
)

inter_subj_par_means_by_pas <- rowMeans(sigmodel_params, dims = 2)



sig_fitted <- sapply(
    paslevs,
    fit_sigmodel,
    dat = exp1,
    params = inter_subj_par_means_by_pas
)

exp1$y_hat <- NA
exp1[exp1$pas == 1, ]$y_hat <- sig_fitted$`1`
exp1[exp1$pas == 2, ]$y_hat <- sig_fitted$`2`
exp1[exp1$pas == 3, ]$y_hat <- sig_fitted$`3`
exp1[exp1$pas == 4, ]$y_hat <- sig_fitted$`4`

ggplot(exp1, aes(target.frames, as.numeric(correct), color = pas)) +
    stat_summary(fun.data = "mean_cl_boot", size = 1.5) +
    geom_line(
        aes(x = target.frames, y = y_hat, linetype = "Predicted"),
        size = 1.5
    ) +
    labs(
        x = "Target frames",
        y = "Accuracy",
        color = "PAS rating",
        linetype = "Model"
    )

interactions::interact_plot(model = m6, pred = "target.frames", modx = "pas")

summary(m6)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## correct ~ target.frames + pas + (pas * target.frames) + (target.frames |  
##     subject)
##    Data: exp1
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##  19506.1  19595.5  -9742.0  19484.1    25033 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -19.0101   0.0537   0.1606   0.4849   1.4465 
## 
## Random effects:
##  Groups  Name          Variance Std.Dev. Corr 
##  subject (Intercept)   0.03698  0.1923        
##          target.frames 0.02057  0.1434   -0.76
## Number of obs: 25044, groups:  subject, 29
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         0.08009    0.24320   0.329  0.74193    
## target.frames       0.87404    0.06792  12.869  < 2e-16 ***
## pas3               -0.74022    0.27065  -2.735  0.00624 ** 
## pas2               -0.77311    0.25133  -3.076  0.00210 ** 
## pas1               -0.20172    0.24613  -0.820  0.41246    
## target.frames:pas3 -0.01055    0.07335  -0.144  0.88565    
## target.frames:pas2 -0.31206    0.06850  -4.555 5.23e-06 ***
## target.frames:pas1 -0.75924    0.06749 -11.250  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trgt.f pas3   pas2   pas1   trg.:3 trg.:2
## target.frms -0.896                                          
## pas3        -0.869  0.759                                   
## pas2        -0.945  0.826  0.847                            
## pas1        -0.966  0.842  0.859  0.936                     
## trgt.frms:3  0.775 -0.776 -0.926 -0.761 -0.769              
## trgt.frms:2  0.841 -0.841 -0.762 -0.921 -0.839  0.790       
## trgt.frms:1  0.853 -0.850 -0.767 -0.838 -0.916  0.791  0.870

7.2.i Of course the plot for the optim function looks a bit wonky (as it is based on the predicted values), but the shape of the slope can still be assessed. A crucial difference between the plots is the slope of PAS 1, which is positive for m6 and negative for the plot of the optim function. As in the previous exercise, the advantage of the optim function is that you can control the parameters resulting in avoidance of below chance level estimates - but as we decide the parameters it might be prone to bias. For m6, we use the inbuilt binomial function making us less prone to bias, but giving us estimates which are uniterpretable.

assignment_3, Methods 3, 2021, autumn semester

assignment_3, Methods 3, 2021, autumn semester

Portfolio Assignment 3, Methods 3, 2021, autumn semester

Study Group 8, Emma Rose Hahn (EH), Viara Krasteva (VK), Kristian Nøhr Villebro (KV), Luke Ring (LR)

Exercise 1

1.1 Describe data structure (EH)

1.1.i There are 682 repetitions, 102 sensors and 251 timestamps

Add time offset (ms) (EH)

Create covariance matrix (VK)

1.1.iii It looks like there is a lot of covariance (which would make sense), however it is quite difficult to get a clear impression with such big covariance matrix.

Create signal averages across repititions (KV)

Find and plot maximum response channel (KV)

1.1.vi We found that the maximum response signal was received by sensor 74 (as python index starts at 0)

1.1.viii The messy plot showing the single repetitions includes a lot of fluctuation around the y-axis, thus including lots of noise, making it difficult to assess anything from. Even though the messy plot looks as if the activity is more or less the same across the whole time period, the averaged plot reveals that activation is quite different over time. As the signal noise goes in both negative and positive direction, they cancel each other out thus resulting in the averaged plot we have seen.

load pas_vector.npy (LR)

1.2.i The pas_vector has length 682, thus having the same length as repetitions.

1.2.iii One would expect that the lower the PAS level the lower the response signal – e.g., PAS 1, not perceiving anything, would result in a low signal, and PAS 4 would result in the highest signal. This seems to correlate with what we see in plot. However, surprisingly PAS 2 has a higher max at 250ms than PAS 4 (which we would expect would have the highest).

Exercise 2

2.1 Logistic regression (EH)

2.1.v As we get a score of 1.0, it seems like an overfitting situation – besides the score of 1.0 we also suspect overfitting based on the lack of model-penalties.

2.1.vi We found there to be 215 non-zero coefficients.

2.1.vii It could seem as there is less covariance, it is however super difficult to assess this from these complex plots still. <3

Exercise 2.2 cross validation (VK)

Stratified 5-fold cross validation logistic model (KV)

Stratified 5-fold cross validation logistic model with l2 (LR)

2.2.iv It seems as if C=1e-05 gives the worst score at 0.455 and C=10.0 gives the best score at 0.475. The difference between them is thus 0.02. It is however surprising that C=1e5 seems to give almost the exact same number as C=1e1. This indicates that reularization is not making a large difference for these data.

now fit a model for each time sample using best c (EH)

2.2.v The best score 61.1% is at timestamp = 176 ms. Chance level should be at 50% as we are predicting PAS 1 and PAS 2

now fit a model for each time sample using c=1e-1 and l1 (VK)

2.2.vi classification accuracy was best (61.62%) at time point -56ms.

fit again but for PAS 1 and 4 (KV)

2.2.vii Best classification (61.58%) is at time point 112ms. Chance level for this analysis should be 50% as we are working with PAS 1 and PAS 4.

2.3 As our scores is all above chance level (however not a lot) it seems to be possible to do pairwise classification. It is quite surprising that the accuracy level of PAS 1 vs 2 and PAS 1 vs 4 are almost identical, though at very different time points.

EXERCISE 3 - Do a Support Vector Machine Classification on all four PAS-ratings

3.1 Support Vector Machine Classification (LR)

Linear kernel classifier (LR)

Radial basis (EH)

3.1.iv) Which one is better predicting the category?

Based on the cross-validation results, the linear kernel classifier achieved 29.9% accuracy, and the radial basis kernel classifier achieved 33.4% accuracy, so for these data the radial basis kernel classifier is the best predictor.

Sample-by-sample analysis (VK)

3.1.iv) Is classification of subjective experience possible at around 200-250 ms?

It seems that the model would perform above random chance at classification of subjective experience for 200-250ms although the accuracy is still not high, ranging from around 0.27-0.34.

Exercies 3.2 Test/train (KV)

3.2.iii) Based on the confusion matrix, describe how ratings are misclassified and if that makes sense given that ratings should measure the strength/quality of the subjective experience. Is the classifier biased towards specific ratings?

The confusion matrix shows a bias towards predicting PAS 3, which can be seen from the PAS 3 predicted label on the x axis, this should not come from PAS 3 being the most frequent rating, because the samples were stratified. Of the predicted labels classified as PAS 1, 2 and 4, the prediction was correct most frequently, which fits with subjects indicating that they either not aware of the stimulus (PAS 1) or did not doubt their experience of the stimulus (PAS 4). The PAS 3 label was most frequently misclassified, especially when the true label was PAS 2, but both 2 and 3 ratings that relate to uncertainty and ambiguity.

assignment_4, Methods 3, 2021, autumn semester

assignment_4, Methods 3, 2021, autumn semester

Portfolio Assignment 4, Methods 3, 2021, autumn semester

Study Group 8, Emma Rose Hahn (EH), Viara Krasteva (VK), Kristian Nøhr Villebro (KV), Luke Ring (LR)

Exercise 1 - Use principal component analysis to improve the classification of subjective experience

1.1 - Create a covariance matrix, find the eigenvectors and the eigenvalues (EH)

1.1.vi) What does the off-diagonal activation imply about the independence of the signals measured by the 102 sensors? (KV)

We see that the off-diagonal activation implies that the signals measured by the sensors are not independent. This makes sense as activity in the brain can involve multiple regions, and we would expect to see some correlation and covariance between sensors.

1.2 - Create the weighting matrix W and the projected data, Z

1.2.iv) are there some values that stand out from the rest? (LR)

The last 5 of the sorted log eigenvalues are much smaller than the previous 97. This matches the result of our matrix rank test which gives values above a threshold.

1.2.vii What has happened to the off-diagonals and why? (LR)

The weighting matrix is the orthogonal projection of the data onto the eigenvectors, this results in values that are not correlated with each other, giving a 0 value for the covariance matrix.

2 - Use logistic regression with cross-validation to find the optimal number of principal components (VK)

2.1.ii) what is the general trend and why is this so? (EH)

Based on the above plot we could say the general trend is that addming more principle components results in a higher classification accuracy. This is because more variance is accounted for, but it also could mean that the model is overfitting.

2.1.iii) In terms of classification accuracy, what is the effect of adding the five last components? Why do you think this is so? (KV)

The last 5 components have a negligible effect, this is because they are the principle components that have the least variance, adding them to our models doesn't provide much additional information.

2.2.ii) how is this plot different from the one in Exercise 2.1.ii? (VK)

This shows a different trend, rather than each additional component increasing the model accuracy, we see a peak at 26 components and after that a decline in accuracy which then levels out at around 70 components. Based on the first model we might assume that adding all the components would result in a higher accuracy, where as using cross-validation we see that the results of the previous model were likely due to overfitting.

2.2.iii) What is the number of principal components, kmax_accuracy, that results in the greatest classification accuracy when cross-validated?

The peak model performance is 26 components, at around 36% accuracy.

2.2.iv) How many percentage points is the classification accuracy increased with relative to the to the full-dimensional, d, dataset? (EH)

using all of the components, the 5-fold cross-validation accuracy is increased by about 6.3%.

2.2.v) How do the analyses in Exercises 2.1 and 2.2 differ from one another? Make sure to comment on the differences in optimization criteria. (KV)

Both use a logistic regression model, the key difference here is that the first analysis is adding components and fitting the model to the entire dataset. 2.2 uses stratified 5-fold cross-validation which tries to ensure balanced classes across the folds, as well as training and testing on different subsets of the data. This is a more robust method of testing the model, as it is less prone to overfitting and while the classification accuracy may seem lower, the accuracy in 2.1 is likely overinflated.

2.3.iii) Describe the differences between the two analyses - focus on the time interval between 0 ms and 400 ms - describe in your own words why the logistic regression performs better on the PCA-reduced dataset around the peak magnetic activity (EH)

The PCA reduced dataset performance tends to be higher than the full dataset, especially around peak activity. The full dataset contains the signal from all the sensors, but some of these may be less relevant to the response being investigated thereby introducing noise into the model. As hypothesized in 2.2.v, the overall accuracy of the full dataset was inflated due to overfitting which reduced its ability to be generalized to other time intervals. Unsurprisingly, as the PCA model underwent more robust validation, the accuracy at the previously modelled timestamp (248ms) remained the same (~36%), and seems to be able to generalize to time intervals around the peak activity, where the signal is likely similar.